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Chapter 1 ®) 
The Characteristics and Diversity oe 
of Experimentation in the Sciences 


Catherine Allamel-Raffin, Jean-Luc Gangloff, and Yves Gingras 


This book is about experimentation in the sciences. It presents a panorama of the 
various ways of experimenting in both the natural sciences (physics, chemistry, 
biology, etc.) and the social and human sciences (psychology, economics, sociology, 
etc.) as well as in those of mixed or uncertain status such as medicine, management 
sciences and archaeology. More precisely, it provides reference points as to the 
nature, the concrete manifestations and the purposes of those activities designated as 
“experimentation”. One might think that this book is the continuation of a long list of 
introductory works that approach the question of experimentation employing the 
expertise of the philosopher, the historian and the social scientist. There are, how- 
ever, very few studies, apart from Observation and Experiment in Natural and 
Social Sciences (Galavotti 2004), that, in a single volume, cover the entire spectrum 
of experimental practices ranging from physics and chemistry to sociology, archae- 
ology, psychology, economics and medicine. This absence can be explained by 
several factors including the fact that philosophers focus more on the natural 
sciences, and especially physics, than on the human and social sciences. In addition 
to this selective focus, much attention has been paid to assessing the validity of the 
results obtained to the detriment of the experimental methods themselves. There are, 
of course, some case studies, most often drawing on experiments considered classi- 
cal or “crucial” in the history of science such as Galileo’s inclined plane, Blaise 
Pascal’s experiments on the weight of air conducted on the Puy-de-Déme, or Pasteur 
and Pouchet’s experiments on spontaneous generation. 


C. Allamel-Raffin - J.-L. Gangloff 
AHP-PReST, University of Strasbourg, Strasbourg, France 
e-mail: catherine.allamelraffin @unistra.fr; gangloff@unistra.fr 


Y. Gingras (DX) 
CIRST, Université du Québec 4 Montréal, Montréal, QC, Canada 
e-mail: gingras.yves @uqam.ca 


© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1 
C. Allamel-Raffin et al. (eds.), Experimentation in the Sciences, Archimedes 72, 
https://doi.org/10.1007/978-3-03 1-58505-0_1 


2 C. Allamel-Raffin et al. 


Going beyond these specific cases, the aim here is to think about the specificity of 
experimentation in the diversity of scientific practices concerning objects of very 
different nature. We will see consequently that the expression “scientific method” 
remains vague and acquires a precise operational meaning only with regard to the 
nature of the objects submitted to investigation. Thus, what do we know about the 
relative place accorded to experimentation, observation or simulation in various 
scientific practices? In what ways and in what capacity does experimentation provide 
evidence or proof with regard to objects as different as a star, a cell, an organ of the 
human body or a political riot? The answers to questions such as these will allow us, 
we believe, to identify the common basis of experimentation in the different sci- 
ences, while at the same time enable us to underscore the limits of a simplistic 
conception according to which there is only one way to provide proof, regardless of 
the particularities of the object under investigation. Since experimentation confers a 
stamp of scientificity, among other things, it is understandable that it is often the 
object of criticism and that those who believe that “science” is both unified and 
singular, also believe that experimentation is the same for all the “real” sciences. 
Though one can find examples of the experimental approach in Antiquity (Grmek 
1997), it is fair to say that the experimental approach became systematic in the 
natural sciences during the seventeenth century and the humanities and social 
sciences have also come to invoke it, in addition to observation, to justify their 
status as science. What methodologies do these disciplines adopt? Is there continuity 
from one discipline to another? What about the “creation” of phenomena as it is 
common in physics or chemistry? Is such creation also at work in other disciplines 
such as medicine and the human and social sciences? 

By experimentation, we mean a type of activity based on the voluntary, system- 
atic and controlled modification of the conditions of the natural sequence of phe- 
nomena in order to determine which parameters contribute to producing a given 
effect (Dupouy 2011; Nadeau 1999; Soler 2019). More precisely, modification 
implies being able to: 1/ isolate variables, 2/ manipulate the variables, being able, 
in principle, to control each variable independently of all the others, 3/ reproduce 
their effects in order to determine which parameters contribute to producing a given 
effect. Reproduction ensures that the phenomenon is real, stable and not a simple 
artefact of the method, the instrument used or a poorly controlled environment. 
Experimentation is thus distinguished from observation, which is more passive and 
does not voluntarily disturb the observed phenomenon. As we will see, although the 
two types of scientific activity (observing and experimenting) are found in most of 
the sciences, contemporary disciplines are increasingly experimental and equipped 
with instruments. They are rarely only observational. And although one might 
perceive a continuity running between daily practice, or common sense experimen- 
tation and scientific experimentation, the first is generally less systematic, individual 
and sporadic and not committed to a collective argumentation, whereas the second is 
subject to the institutionalized rules of a scientific community. We are interested here 
only in the latter whose experiments are indeed subject to strong normative con- 
straints linked to the collective character of institutionalized science. This 
normativity, inherent to scientific research, is present in the dynamics of argument 
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and counterargument (Bachelard 1938; Bourdieu 2001; Gingras 2021). Debates 
among researchers concern, in particular, the conditions for the effective realization 
of experiments (manipulative skill, methodological rigor, etc.). They also concern 
the choices to be made regarding the characteristics of the relevant experimental 
devices. While the individual testing hypotheses at home is not confronted with 
other scientists who can contest it, the scientist is subject to the “norms that are 
expressed in the form of prescriptions, prohibitions, preferences and legitimated in 
terms of institutional values that will be internalized to varying degrees by the 
scientists” (Darmon and Matalon 1986, p. 209). 

Under the influence of Pierre Duhem and Karl Popper, philosophy of science has 
long placed experimentation under the umbrella of theory. Even the constructivist 
and relativist sociology of science has often taken up Duhem’s thesis and insisted on 
the fact that all experimentation or observation is necessarily linked to an underlying 
theory (theory-laden). Since the 1980s, however, a great deal of research has 
criticized this theory-oriented conception of scientific practice and highlighted the 
fact that observation and experimentation have a certain autonomy with respect to 
theories and that their purpose is not simply to test or refute them (Hacking 1983, 
2004; Franklin 1986, 2016; Franklin and Perovic 2021; Karaca 2013). This relative 
autonomy of experimentation is not only logical, but also manifests itself at the level 
of social organization and the division of labor, notably in the form of the develop- 
ment of communities of experimenters (Galison 1987) and instrumentalist commu- 
nities that work less on the phenomena than on the instruments used to produce or 
detect them (Joerges and Shinn 2001). Thus, a profound theoretical change can occur 
without any change in the experimental facts, or even in the experimental instru- 
ments and devices. Conversely, a major change in instrumentation can occur without 
the theoretical framework of the phenomenon studied being modified. This is what 
Thomas Nickles calls a disruptive change (such as the invention of the scanning 
tunneling microscope), the effects of which do not result in the formation of a new 
paradigm in the sense of Kuhn, but instead generate a new space of investigation 
(Nickles 2008). 

The assertion, long presented as central to the philosophy of science, that 
experimentation and observation are intended only to test or disprove theories 
leads to a subordination of experimentation as a mere means, necessary but 
unproblematic in the context of logical analyses of scientific discovery. On the 
social level, this is reflected in the superiority conferred to theorists over experi- 
menters, the latter rarely rendered visible in media accounts of scientific discovery. 
Once the relative autonomy of experimentation has been accepted, we can ask what 
other purposes can be attributed to experimentation within the framework of scien- 
tific investigation. Allan Franklin, for example, has noted that experimentation plays 
many other significant roles in science. These roles include “exploratory experi- 
ments designed to investigate a subject for which a theory does not exist so that a 
theory may be formulated; experiments that help to articulate an existing theory; 
experiments that call for a new theory either by demonstrating the existence of a new 
phenomenon in need of explanation or by demonstrating that an existing theory is 
wrong; experiments that provide evidence for entities involved in our theories or 
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new entities; experiments that measure quantities that are of physical interest such as 
Planck’s constant or the charge of the electron; and experiments that have a life of 
their own, independent of high-level theories” (Franklin 2016, pp. 1-2). Experimen- 
tation often requires instrumentation of varying sophistication, and one notable 
effect of these instruments is to stimulate the creation of new concepts (Gingras 
and Godin 1997, p. 152). Instruments also make it possible to create new phenom- 
ena, otherwise not directly observable or which do not (or rarely') exist as such in 
nature, such as the Josephson effect, which manifests itself by the appearance of a 
current between two superconducting materials separated by a layer of 
non-superconducting insulating material (Hacking 1983, p. 228-229). Beyond the 
inevitable variations arising from the nature of objects, the essence of experimenta- 
tion therefore lies first and foremost in the idea of modifying and controlling vari- 
ables, and not in the idea of confirming or refuting a theory or a phenomenon. This 
idea appears notably in Claude Bernard’s work when he proposes to distinguish 
between experimentation in the strict sense (the sense that interests us here) and 
experimental reasoning (or experimental method). For Bernard, “observation is the 
investigation of a natural phenomenon and experiment is the investigation of a 
phenomenon modified by the investigator” (Bernard 2008, p. 123). Experimental 
reasoning, as he defines it, is more encompassing and “is nothing other than 
reasoning by means of which we methodically submit our ideas to the experience 
of facts” (Bernard 2008, p. 103). It is therefore the same in the observational sciences 
as in the experimental sciences (ibid., p. 125). 

Although the conceptual distinction between observation and experimentation is 
essential, it does not imply the independent existence of “observational sciences” 
and “experimental sciences” Reality is more complex and most sciences use both 
observational and experimental methods. If astronomy is, for obvious reasons, 
mostly observational and nuclear physics experimental, chemistry almost always 
requires an intervention to combine substances, and biology in the broad sense is 
also, nowadays, more experimental than purely observational as it was at the time of 
early botany or zoology. Even archaeology, which one might imagine to be based 
solely on the more or less fortuitous discovery of artifacts (and therefore in this sense 
observational because it does not produce these artifacts), can also be experimental, 
as the chapter of this book dedicated to this discipline shows. The situation is also 
complex in most of the social sciences, which pursue actual scientific research and 
not simply social activism. The management sciences as well as economics, also 
claim to experiment and not just to observe. Through these few examples, the notion 
of modification appears to be the essential characteristic of scientific experimentation 
which is valid for all the objects investigated: atoms, molecules, stars, galaxies, 
living cells, individuals, social communities, etc. This characteristic feature of 
experimentation allows for a great diversity of practices linked to diverse contexts. 


: Indeed, one cannot exclude rare natural occurrences such as what seems to have been, for example, 
the existence of a natural nuclear reactor in Gabon in a very particular geological context, more than 
two million years ago. See, Gauthier-Lafaye et al. (1997). 
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Thus, Ian Hacking has offered “a taxonomy of the internal elements of an experi- 
ment” and has identified fifteen items associated with scientific experimentation, 
which he has collected into three categories (Hacking 1992, pp.44—45). Within the 
category of “ideas”, he distinguishes: 


— questions, 

— background knowledge, 

— high-level theories in relation to the subject, which have no direct experimental 
consequences, 

— “topical hypotheses” (which make it possible to establish bridges between the 
high-level theories and experimentation), 

— models related to the apparatus. 


“Things” are divided into: 


— the “target” (the substance or population studied) 

— the source of modification (that which alters or interferes with the target) 
— the detectors, 

— the tools, 

— the data generators. 


Finally, with regards to “inscriptions”, Hacking discriminates between: 


— the data, 

— their assessment (statistical estimation of the probability of error, estimation of 
systematic errors due to the equipment), 

— their reduction, 

— their analysis, 

— their interpretation. 


For the informed reader, this taxonomy includes the usual items cited by philoso- 
phers of science influenced mainly by physics: “high-level theories”, “topical 
hypotheses”, “models related to the apparatus’, together with the distinction between 
“detectors” and “data generators” (which presupposes the existence of a complex 
and sophisticated instrumentation), not to mention the modalities of data processing. 
Nonetheless, it can be argued that while the list proposed by Hacking does identify 
the common elements of an ideal-typical definition of scientific experimentation, 
certain traits fade or disappear entirely depending on the discipline and specialty 
examined, as the contributions gathered here show. In addition, entirely new traits 
emerge that are not on Hacking’s list. In psychology, for example, an experiment 
includes complex data collection techniques, but it does not necessarily require the 
elaboration of topical hypotheses and does not necessarily use detectors, which are a 
source of multiple problems regarding the recording of data, etc. In contrast, social 
psychology uses control groups to guarantee the validity of the results obtained. 
Even within physics, some experiments do not contain all the items of the ideal- 
typical definition: in exploratory experimentation, for instance, we do not have high- 
level theory, since this type of experimentation aims precisely at constituting a new 
theory from the investigation of little-known phenomena. 
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Hacking himself insists on the item he designates as the “source of modification”, 
which brings us back to the requirement of modification presented above as the only 
necessary and sufficient condition that any experiment must satisfy. In principle, the 
source of modification can itself take many different forms. But in keeping with his 
view of experimentation in physics as the prototype of experimentation in the 
laboratory sciences, Hacking writes: 


There is usually apparatus that in some way alters or interferes with the target. In certain 
branches of physics, this is most commonly a source of energy. Traditional inorganic 
chemical analysis modifies a target by adding measured amounts of various substances, 
and by distillation, precipitation, centrifuging, etc. (1992, p. 46) 


In other words, it is a question of intervening on things by means of instruments. 
From this point of view, the sciences furthest from this conception of experimenta- 
tion are the humanities and the social sciences. For this reason, one should not place 
too much emphasis on instruments to define experimentation and should rather stick 
to the idea of intervention. One can, for example, disrupt the behavior of a group of 
monkeys simply by unexpectedly throwing a bunch of bananas at them and observ- 
ing their reactions! Although physics is in fact the science most heavily based on 
instruments, this does not by itself guarantee the “scientificity” of the experiment. 
One can indeed imagine multiple subtle and non-invasive means to modify mental 
states or individual or collective human behaviors. In social psychology, for exam- 
ple, one can manipulate the characteristics of the participants’ social environment in 
various ways. 

Depending on whether experimenters intervene on quarks or benzene or examine 
the behavior of credit buyers or an individual subjected to group pressure, the 
modalities of the intervention will change. Ways of investigating vary according 
to the objects of study, depending on their assumed or known ontological charac- 
teristics. As the contributions to this book show, it is indeed the nature of the object 
under investigation that dictates the concrete modalities of the experiment and it is 
the dynamics of the exchanges between the members of the community that vali- 
dates the most robust methods that, in turn, allow researchers to produce knowledge 
specific to each discipline. We can easily surmise that the ethical consequences of 
experimentation are not the same for a protocol that aims to confine a set of atoms in 
a magnetic field to observe their behavior when subjected to various constraints and 
a social psychology protocol that confines humans to study their reactions to 
authority, as in the famous Milgram experiment (Milgram 2017). The same is true 
of the reproducibility issues that have surfaced in recent years, particularly in social 
psychology (Allison et al. 2016; Peterson 2021). It is easy to understand that if 
electrons are considered identical and indistinguishable from each other, the same 
cannot be said of human beings, for whom it is difficult to fully control all the 
variables and environments in which they live. Hence the need to adapt the exper- 
imental method to the specific nature of the objects studied. 

In sum, beyond the positivist dream of a unified science, the diversity of objects in 
the world argues for a broader conception of scientificity based on a conception of 
experimentation adapted to the kind of objects studied: people and societies are not 
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electrons or mices. This is what we believe the contributions gathered in this book 
show, covering a wide range of objects and experimental practices in a great 
diversity of disciplines, all aiming at producing valid scientific knowledge, but 
always subject to the critical scrutiny of the active members of these multiple 
scientific communities, a scrutiny that alone ensures the value of the knowledge 
produced thanks to ever more diversified and imaginative experiments. 
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Chapter 2 ®) 
Experimentation in Physics se 


Yves Gingras 


This chapter presents the different purposes of observation and experiment in 
physics using examples that allow us to grasp the historical transformations linked 
to the development of instrumentation. We cover both the observational and exper- 
imental aspects of this discipline, which range from astronomy and astrophysics to 
nuclear and particle physics, including optics and solid-state physics. 

As noted in the introduction, philosophy of science has long privileged theory and 
considered experimentation as a self-evident activity, aimed only at confirming 
or disproving theories. Since the 1980s and under the decisive influence of Ian 
Hacking’s work, historians and philosophers of science have examined the different 
purposes of experimentation and shown that they cover a much broader spectrum 
than mere confirmation or refutation of theories. 

In the following sections, we will analyze in greater detail the purposes of 
observations and experiments in physics and their relations with theory, the question 
of the reproduction of results and the regression of experiment, all the while 
considering the diversity of the objects studied. But it is first appropriate to briefly 
recall some useful distinctions concerning theory, which will facilitate the under- 
standing of the relations between theory and experimentation. 


2.1 Different Types of Theories 


Philosophers of science allow us to distinguish, under many names, three kinds of 
theory. First, there are theories said to be generic (or principled, to use the term 
suggested by Einstein (Lange 2014), that frame a broad and diverse set of 
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phenomena, such as the theory of special relativity or classical thermodynamics. 
Then there are theories that concern a more circumscribed set of phenomena at a 
specific scale or a particular entity, such as light or elementary particles. Consider, in 
this regard, Maxwell’s electromagnetic theory, which is supposed to explain all the 
properties of light, or of quantum mechanics, which is supposed to account for all the 
phenomena at the atomic scale. In principle, they must be consistent with generic 
theories. This constraint means, for example, that the Schrédinger equation in 
quantum mechanics is replaced by the Dirac equation in relativistic quantum 
mechanics and that both must obey the principle of conservation of energy which 
is the basis of thermodynamics. Finally, there are theories of how a measuring 
instrument functions, which are most often, if not always, different from and 
independent of the theory of the phenomenon being measured. A trivial example 
is the use of the telescope by Galileo whose optical theory, proposed in 1611 by 
Kepler (Malet 2010), is independent of the theory of the planets or, more broadly, of 
the cosmological model adopted. 


2.2 Observation 


Even though recent work in philosophy of science privileges experimentation to the 
point of absorbing observation within its sphere of action, especially when obser- 
vation is based on instruments, observation without experimentation remains the rule 
in domains that are not accessible to direct manipulation, such as astronomy and 
astrophysics. We must therefore attend to observation in order to understand its 
characteristics. 

Knowledge of the physical world can be gained through direct observation (with 
the naked eye) or indirect observation (through instruments such as telescopes and 
optical microscopes). Astronomy is the oldest observational science par excellence. 
Prior to being able to experiment on comets or celestial bodies (Moon, Mars, etc.) 
using artificial devices delivered by rockets and to observe with telescopes, astron- 
omy had remained for thousands of years a science of naked-eye observation. The 
Babylonian lists of the positions of the main observable planets against the back- 
ground of the starry sky provided the oldest systematic observations. The regularities 
that the scribes deduced from these series permitted the construction of arithmetic 
models for predicting future positions. This is a good example of the ability to 
predict the future positions of observed objects by extrapolation from data that make 
regularities visible. One can of course debate whether this is “science”, but this 
knowledge clearly has a theoretical aspect because it is indeed calculations that are 
used for predictions. In this case, the observations led to predictions without using a 
spatial model of the position of the planets in space as will be the case thereafter. The 
planets were simply projected onto the background of the sky without questioning 
the relative distance between the planets in a three-dimensional space. 

With the conceptualization of a spherical cosmos in three dimensions by the 
ancient Greeks, a spatial dimension appeared that made it possible to think of the 
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planets in terms of their distance from the Earth. This allowed Aristarchus, for 
example, to calculate the relative distance between the Earth and the Moon from 
not very precise but relatively simple angular measurements (Aristarchus of Samos 
2003). Once the Ptolemaic theory of the cosmos was established, systematic obser- 
vation of the orbits of the planets and comets was used to test its validity. Thus, using 
more precise angular measurements produced by relatively simple but large instru- 
ments, Tycho Brahe (1546-1601) proposed that the sky is a continuous fluid and 
is not composed of crystalline spheres as many of his contemporaries believed. Here, 
systematic observation allowed the production of new ideas to better account for 
celestial phenomena. This was again the case with Johannes Kepler who had been 
tasked with seeking some order in the observational data collected by his master 
Tycho on the planet Mars. After years of hard work, he finally derived his three laws, 
the so-called Kepler’s laws. Today we would say that Kepler “modeled” Brahe’s 
data (Brackenridge 1982). 

Once again in the astronomical field, an important change occurred with the 
advent of Galileo’s telescope. Galileo took advantage of this new instrument to 
observe the Moon and the planets. It was still observation, but that observation was 
now carried out by means of an instrument whose mode of operation Galileo initially 
ignored and which, as mentioned above, Kepler would theorize in 1611. These new 
observations, inaccessible without the new instrument, made it possible to question 
the Ptolemaic order of the planets. Indeed, Galileo’s observation of the phases of 
Venus, which are like those of the Moon, proved that Venus revolves around the Sun 
and not the Earth. Note that all these observations were independent of the theories 
that they subsequently engendered, even though they, of course, presupposed 
specific concepts such as the fact that the cosmos is spherical and that Euclid’s 
geometry can be used to measure angles and relative distances between the planets, 
concepts that were inaccessible to Babylonian astronomers, for example. Observa- 
tions may also implicitly contain metaphysical conceptions that guide the research, 
but this does not logically entail a dependence on theory in the sense that this term 
has been defined above. Thus, astronomical data have long been compatible with 
Copernicus’ system as well as with the mixed system of Riccioli and Brahe, which 
placed all the planets around the Sun except for the Earth, which remained fixed and 
around which the Sun and its planets rotated. 

Observational physics, which does not control the production of the phenomenon 
it studies, can be combined in a complex way not only with measuring instruments in 
the case of visible objects, such as stars, but also with instruments for the detection of 
invisible entities. This is the case for the detection of gravitational waves. Their 
existence was predicted in 1916 by Einstein’s theory of general relativity and 
physicists have built a device to detect the waves based on the effects, predicted 
by the theory, that they are supposed to produce in the instrument. Here, the device is 
built entirely according to the properties expected and predicted by the theory, but 
the phenomenon itself is neither controlled nor produced by the device, and even the 
frequency of the phenomenon’s appearance is unknown. The device only reacts 
when, by chance, a gravitational wave passes in its vicinity. As we know, these 
waves were finally detected in 2016 by an expensive and complex laser 
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interferometry-based instrument called LIGO (Laser Interferometry Gravitational 
Observatory). The fact that this instrument is called an “observatory” reminds us that 
it is a form of observation but by instrumented detection, because the entity itself is 
obviously invisible to the naked eye. This case is original because the detection is 
then combined with the theory in order to trace the cause of the production of these 
gravitational waves. But here as elsewhere, the theory of the instrument (the laser 
interferometer) differs from the theory of the phenomenon (general relativity). 

In sum, observations and experiments can be the result of empirical exploration 
with little theoretical guidance, as when a radio telescope scans the sky for possible 
new objects, or, on the contrary, they can be designed to test a particular theoretical 
statement, as was the case with LIGO seeking gravitational waves or the CERN 
particle accelerator searching for the Higgs boson. 


2.2.1 Limited Reproducibility of Observations 


The question of the reproducibility of observations can be problematic in the case of 
phenomena whose source and cause are beyond control and whose frequency of 
occurrence is unknown. In 1572, Tycho Brahe observed a new star in the sky that 
had heretofore been considered unchanging. He was lucky. No other new stars 
appeared for many years afterwards. At the time, Tycho’s contemporaries were 
able to quickly confirm his observations because the new star lit up the sky for 
about 16 months in the constellation of Cassiopeia. Astronomers would surely have 
doubted the reality of his surprising observations if the celestial phenomenon had 
lasted only a few days and had not been seen and thus confirmed by other astron- 
omers. While other new stars did appear in the European sky, such as one in 1596, 
and the star observed by Kepler in 1604, the phenomenon remains rare and 
unforeseeable. It is currently estimated that a nova appears once a century (van 
den Bergh 1993). 

The fact that only one instrument is available to detect a natural phenomenon also 
raise problems for the confirmation of the detection, and therefore the existence, of 
the phenomenon. Thus, it would have been possible, in principle, to doubt 
the detection of a gravitational wave in December 2015 by LIGO, if it had been 
the only measuring instrument available. Indeed, one might have believed that the 
gravitational wave was simply an artifact intrinsic to this instrument. But the 
construction of two LIGO detectors, one in Harnford, Washington, and the other 
in Livingston, Louisiana, eliminated such reasonable doubt and ensured that the 
detection was real by comparing the results of the different instruments at the same 
time. Considering the speed of propagation of gravitational waves, they should be 
detected by both instruments with a very slight time lag. This confirmed that the first 
instrument had indeed detected a wave and not an artifact. A third instrument, Virgo, 
built near Pisa, Italy has since been added. In August 2017, the new instrument 
detected a gravitational wave for the first time, an observation that was also recorded 
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by the two American instruments with a time shift of a few milliseconds, confirming 
the reality of the passage of a gravitational wave. 


2.3 Different Types of Experimentation 


While we should not minimize the importance of observation in physics, even if it is 
nowadays highly instrument-aided and by no means reliant on direct observation as 
was the norm before the beginning of the seventeenth century, the fact remains that 
physics has since been characterized by an intensive use of experimentation. This 
experimentation aims at acting on nature by using more and more complex instru- 
ments which are increasingly “materialized theories”, as the French philosopher 
Gaston Bachelard says, in that they embed concepts derived from theories 
(Bachelard 1983, p. 16). The history of physics offers mainly experimentation 
understood in its restricted sense, experiments that produce or modify a phenomenon 
or establish the properties of a natural phenomenon by submitting it to an experi- 
mental set-up. 

In physics, experimental practices differ greatly according to the major specialties 
(astrophysics, elementary particles, solid state, atomic and molecular physics, optics, 
etc.). As physics is generally highly mathematized, it allows for predictions that can 
then be tested through experimental set-ups. In the more empirical fields of physics, 
where the theoretical basis of the phenomena is less well known, such as high- 
temperature superconductivity, the tests are more random, even if they are anchored 
in certain core ideas for testing different composite materials, based on copper, 
sulfur, lanthanide, etc., which is more a matter of chemical tinkering than theoretical 
deduction. The aim here is to produce a phenomenon by trial and error rather than to 
test the validity of a scientific prediction. It is possible to list a non-exhaustive 
number of purposes served by experimentation, according to the different physics’ 
specialties: to learn about the properties of already known entities such as light or 
electrons; to observe and measure the effect of an electric field or a magnetic field on 
the behavior of atoms, such as the change in the frequency of waves emitted by 
atoms (respectively named the Stark effect and the Zeeman effect); to generate a new 
entity, such as the Higgs boson for example, or other antiparticles; to explore a 
region of energy in collisions between particles to see if this causes new particles to 
emerge; to observe the change in behavior of a fluid at very low temperature 
(superconductivity and superfluidity); to modify the physical or chemical properties 
of a compound to make it is superconducting; to establish dependence relations 
between two phenomena (electricity and magnetism) or two variables of the same 
phenomenon (pressure and temperature) and thus establish a new empirical law, etc. 
(Franklin 1986). 

As is the case of observations, the relationship between theory and experimenta- 
tion varies and does not simply consist in testing existing theories. It is therefore 
useful to distinguish two main purposes of experimentation. On the one hand, we 
have experimentation with a theoretical aim, which explicitly seeks to test a theory or 
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a prediction. On the other hand, we have exploratory experimentation, which covers 
a wide spectrum of empirical research that does not explicitly aim to test a theory but, 
as its name indicates, seeks to explore unknown or little-known phenomena in order 
to identify their properties without having, in hand, a precise conceptualization of the 
phenomenon under study. This is the kind of experimentation that has been widely 
studied. 

The distinction between a so-called exploratory experiment that is not based on a 
preexisting theory and an experiment aimed at testing an existing theory is some- 
times contingent and depends on the point of view adopted by the actors. Consider 
the properties of light. Regardless of whether light is pressure, as Descartes thought, 
or a stream of particles, as Newton suggested, a researcher may seek to measure its 
speed of propagation, simply to find out whether it is finite or not, without reference 
to a prior theory concerning the specific nature of light. Thus, Galileo raised this 
question in his Discourse on Two New Sciences well before the emergence of the 
aforementioned theories, but failed to come to a positive conclusion, although he 
favored a finite speed (Romer and Cohen 1940). In an entirely different field, 
elementary particle physics, Koray Karaca (2013, p. 126) also notes that the distinc- 
tion between the two types of experiments may depend on the perspective of the 
actors involved. Once established, exploratory results can of course influence theo- 
ries, but the measurement itself is independent of them. Thus, after Roemer had 
established, in 1676, that the speed of light is finite, Descartes’ theory was refuted, 
given that Descartes was entirely convinced that the speed was infinite (Boyer 1941). 

There are other borderline cases between exploration and verification. Consider 
Galileo’s use of an inclined plane to study falling bodies. The originality of the 
inclined plane as a measuring instrument is important. Measuring the fall of a body 
with precision with the naked eye is difficult, if not impossible. The fall must 
somehow be “slowed down” with the support of the inclined plane. The less inclined 
the plane, the less rapid the descent and vice versa. This rather abstract idea allowed 
Galileo to construct and quantitatively verify his law of falling bodies.’ But such an 
experiment can be considered exploratory if we suppose that Galileo did not yet have 
his law, or as a test if he had first deduced it through reasoning. 

A clear case of an experimentation designed to test a theory in the seventeenth 
century is Pascal’s famous experiment at Puy-de-Déme in 1648 measuring the 
weight of air: if it is true that the height of the liquid in Torricelli’s tube (the 
barometer) is due to the pressure produced by the column of air above the instru- 
ment, then this pressure should be lower at the top of a mountain because there is 
obviously less air pressing on it. This is a simple deduction, but one that can be 
difficult to verify experimentally (Mazauric 1998; Jones 2001). 

In these two examples, the variables are controlled: with the angle of the inclined 
plane fixed, we can measure the fall time and by changing the angle, we can compare 


'The historian Alexandre Koyré considered this experience to be imaginary, whereas documents 
and reconstructions allow us to conclude that Galileo really did perform these experiments. See, in 
this regard, the classic paper by Thomas B. Settle (1961). 
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the fall times for a given distance. In Pascal’s case, one can change the liquid in the 
barometer or the height to which the tube is moved. The higher it is in relation to a 
given location (sea level for example), the more the liquid in the barometer will fall. 
These experiments are relatively easy to reproduce, with, of course, errors in 
experimental measurements whose size depends on the skill of the experimenters 
and are more than sufficient to establish the veracity of the observed phenomena and, 
if necessary, the veracity of their interpretation. While the cases of Galileo or Pascal 
can be associated with the test of a theoretical conception, there are also many cases 
of exploratory experimentation that are not at all linked to prior knowledge of the 
phenomena studied. Consider Faraday or Ampére who looked for relationships 
between two phenomena, in this case electricity and magnetism, in order to establish 
empirical laws (Steinle 1997). Still in the nineteenth century, consider Ohm who 
studied the relationships between the voltage, the current and the resistance of an 
electric wire through which a current flowed (Schagrin 1963). 

Unlike biology or medicine, physics does not really need the notion of a “control 
group”, but rather must ensure that the phenomenon is isolated from parasitic 
influences and that no effect is measured when the variable is absent (for example, 
when the electric or magnetic field is zero). Depending on the type of measurement, 
it is also necessary to calibrate the apparatus to determine the sensitivity and the 
order of magnitude of the measurement. In statistical physics, things become a little 
more complicated, but the basic principle remains the same. In all cases, it is a 
question of finding an experimental set-up that allows for the measurement of the 
chosen phenomenon. Thus, when Jean Perrin wanted to verify the validity of the 
laws of Brownian motion established by Einstein in 1905, he set up an experimental 
device that allowed multiple observations and the calculation of averages (Perrin 
1913). It was a question here of testing the validity of a statistical law established on 
theoretical bases. In the same way, Millikan validated Einstein’s law on photoelec- 
tric emission without himself admitting the reality of light particles, which were the 
basis of the calculation. This example also shows that researchers can remain 
agnostic about the entities at the basis of the theories (in this case, the photon) 
while admitting a law in a phenomenological and empirical way. It becomes more 
difficult, however, to doubt the existence of the photon after the Compton experi- 
ments which clearly showed that a photon in collision with an electron has the 
properties of a corpuscle and obeys the classical laws of conservation of 
momentum (mv). 


2.3.1 Difficulties of Experimentation 


The difficulties of experimentation vary greatly according to the case and the 
domain. They are mainly methodological and technological, most often related to 
the weakness of the signal sought. In elementary particle physics, for example, we 
sometimes obtain ambiguous signals or artifacts. In this field, the results are checked 
by using statistical tests to diminish the likelihood that the observed event is due to 
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chance. Thus, a statistical deviation of the signal from the noise of more than 
5 standard deviations is required to confirm the existence of a new particle. More 
rarely, one can also look for a unique event if its characteristics are specific enough to 
ensure that its existence can be confirmed (Galison 1997). Finally, faced with the 
production of massive data on particle interactions, data selection methods involving 
sophisticated algorithms based on machine learning are nowadays used to help 
identify the most significant events in the billions of particle interactions produced 
in accelerators (Radovic et al. 2018). 

The increasing calculability based on well-established theories, facilitated by the 
existence of progressively more powerful computers, has multiplied the modeling of 
phenomena. The increased use of digital modeling has also given rise to the idea of 
“digital experimentation”. Unless, however, we endorse some form of absolute 
idealism that does not distinguish the real material world from its numerical and 
mathematical representation, this kind of experimentation does not have the same 
epistemic value as real experiments. Numerical “experiments” can of course inform 
us about the properties of the model itself but not about its validity which, in physics, 
must be established by the confrontation with real data. In sum, there is indeed an 
epistemic superiority of experimentation over simulation (Roush 2018; Varenne and 
Silberstein 2021). 

The question of the replication of results arises differently for experimentation 
than for observation because, in principle, experimentation produces the phenome- 
non in acontrolled environment. The existence of unique instruments in Big Science 
can raise problems and requires additional methodological precautions because 
results obtained in this context cannot be reproduced in other laboratories using 
comparable apparatus. For example, the LHC at CERN is currently the only place 
where Higgs bosons can be produced. In this case, validation is based on the use of 
different detectors whose results are analyzed by different teams who must arrive at 
the same conclusions in order for the phenomenon to be considered validated. In the 
case where only one detector exists, the data is submitted to different teams who 
analyze them independently to see if they draw the same conclusions. This mini- 
mizes the danger of validating artifacts due to the complexity of the equipment used 
and avoids possible confirmation bias. 

Particle physics, with its frequent announcements of new particles, has accus- 
tomed us to thinking in terms of discoveries of new entities: electron, neutron, quark, 
Higgs boson, gravitational wave, etc. For scientists, experiments of this kind are 
based on the presupposition of a “realism of entities” as Ian Hacking (1983) puts 
it. Once the process of detection or production of these particles has been stabilized, 
Hacking rightly considers the existence of these entities as confirmed. They can then 
can become tools for studying other objects. Thus, at the time of the discovery of 
X-rays, physicists first studied their various properties: frequency, reflection, refrac- 
tion, polarization. But after having studied their object, so to speak, they used X-rays 
as a tool for doing other things. Technical improvements sought to produce X-ray 
beams to study, for example, the structure of crystals. In short, we stop 
experimenting on an entity when we believe that we have defined its fundamental 
properties, which can then be the subject of increasingly precise measurements in 
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connection with metrology, for example. The improvement in the precision of the 
measurements, such as those of the very low mass of neutrinos, long considered to 
be massless, allows us to better understand the nature of the entity studied. 


2.3.2 Experimentation Regress 


Sociologist of science Harry Collins has advanced the idea, dear to social construc- 
tivists, that agreement on the validity of an experimental phenomenon owes nothing 
to the empirical facts observed, since these can always be called into question in an 
infinite series of argumentative regress (Collins 1985). This he calls the “experi- 
menter’s regress”, whereas it would be more precise to talk of an “experimentation 
regress” because it is the experiment that is at issue and not the experimenter. 

Collins’ argument is in fact identical to that of the Greek sceptic, Sextus 
Empiricus, who established that to judge the appearances of objects, one needs an 
instrument that provides a basis for the judgement. But, to confirm the validity of this 
instrument, one needs a demonstration, and to verify this demonstration, one needs 
another instrument, thus creating an infinite regress (Godin and Gingras 2002). In 
physics, this argument is valid only for phenomena at the limit of measurement when 
the signal-to-noise ratio is very low which enables one to doubt the existence of the 
phenomenon if one does not already believe that it exists for other reasons. A similar 
situation obtains if the observed phenomenon is incompatible with the widely 
accepted theoretical basis, as was the case in the controversy surrounding the 
experiments on the effect of high dilutions of antisera on basophil degranulation 
(Ragouet 2016). In the latter case, the experimental system used was considered 
unstable, so that the results were not easily reproducible by independent teams. In 
contrast, the 1986 announcement by an IBM Switzerland team of the surprising and 
unprecedented phenomenon of so-called “high-temperature” superconductivity in 
ceramic oxide materials was quickly accepted because the results were reproduced 
without much difficulty by other teams (Nowotny and Felt 2002). Only researchers 
who are convinced of the existence of a phenomenon tend to interpret experiments 
that confirm it as valid, even in borderline cases, whereas researchers who do not 
believe in the existence of the phenomenon (homeopathy in the case of the effect of 
high dilutions of antisera on basophil degranulation) have every reason to explain 
positive results as artifacts, if not as due to outright manipulation and fraud. It can, 
however, be safely predicted that if a phenomenon’s signal is strong enough and 
stable, then regression will not exist, and the “answer” provided by nature in the 
context of the experiment will be clear and quickly accepted. 
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2.4 Conclusion 


Between the beginning of experimental physics in the seventeenth century and the 
diversified and multiple practices of contemporary physics, both empirical and 
theoretical developments have been made based on increasingly robust and complex 
instrumentations, as the studied phenomena have moved away from direct sensory 
perception and common sense. While observation remains the rule in astrophysics, it 
has become indirect and quite dependent on very sophisticated instruments that are 
themselves developed and analyzed via a set of generally stabilized theoretical 
considerations. Experimentation with a theoretical focus remains important, but it 
does not eliminate both experimental and observational exploration, as shown by the 
scanning of the sky by the European telescope Gaia, which has accurately mapped 
1.7 billion stars in our galaxy. In sum, the diversity of observational and experimen- 
tal approaches remains, even though the development of knowledge leaves less room 
than in the eighteenth and nineteenth centuries for purely exploratory experimenta- 
tion. The formulation of theories covering the entire field of physical objects allows 
us to justify experimentation and the construction of new instruments more easily 
based on predictions to be tested with new instruments than by invoking the simple 
curiosity of exploring a little-known field or object. This tendency maybe linked to 
the fact that researchers themselves value theory more than experimentation and 
most often require theoretical justification for any new experimentation proposed. 
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Chapter 3 ®) 
Experimentation in Chemistry as 


Jean-Pierre Llored 


From a historical point of view, chemistry has always been both a science and an 
industry. It has, since its beginnings, been materially creative, productive and opera- 
tive. Chemistry inevitably refers to investigative approaches that are closer to descrip- 
tions than to hypothetical-deductive reasoning, to a dizzying pluralism of instruments, 
procedures and methods, to a knowledge that is often practical and mediated by doing. 
In the definition, conceptualization and realization of what chemists do — which is 
always open and provisional — , it is nearly impossible to disregard the reagents. 

Moreover, chemistry never ceases to challenge the distinctions between science 
and technology. It casts into doubt distinctions concerning the natural and the artificial, 
the intrinsic and the relational, theory and practice, knowledge and interest, science 
and art. Thus, chemistry opens a new perspective on the ways of “doing science”, and 
on the recourse to experimentation to study, and act on, already existing matter or 
matter that we produce. In order to do this, we will first consider the objects of 
chemistry and the role played by experimentation in the constitution of these objects. 
We will then examine, through examples, how chemists set up experimental protocols 
to deal with the dependence of chemical bodies on the inert or living environments in 
which they are found. To this end, we will focus on the preparation of reference matrix 
for measurement and on the validation of results. 


3.1 Objects of Chemistry and Experimentation 


Chemists have created a wide variety of instruments, experimental devices, pro- 
cesses, know-how, explanations, theories and models, all of which are based on 
substances and chemical reactions (molecules, materials, etc.). In this framework, 
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substances and reactions are interdependent: it is impossible to think of substances 
without the reactions that allow them to be obtained, and vice versa. A substance is 
the result of successive purifications that allow it to be isolated from other substances 
for a certain period. This purified substance then allows chemists to define a new 
class of substances that all take part in “similar reactions” with other substances. 
Thus, substances have always been classified in relation to each other according to 
the types of reactions in which they participate and of which they are the object. 
Alchemy at first, then chemistry up to its most contemporary form, correspond, in 
this sense, to a vast undertaking of classification but also to the creation of substances 
by means of experiments allowing to make them react on each other, to separate 
them and to purify them. Let us illustrate this mutual dependence of bodies and 
chemical reactions by placing it in the experimental framework that makes it 
possible. 

Among the many examples that could be drawn from any period of the history of 
chemistry, the tables of affinity or ratios between substances, essentially empirical, 
made it possible, in the eighteenth century, to classify substances in relation to each 
other according to the notion of “displacement” (Gyung 2003) which the chemist 
Etienne-Frangois Geoffroy, author of the famous Table of the different ratios 
observed between different substances (1718), explained as follows: “every time 
that two substances which have some disposition to join with each other are found 
united together, if a third substance arises which has more connection with one of the 
two, it unites with it by making the other one let go” (cited by Partington 1962, 
p. 53). This third substance, which we will denote as “C”, undoes the pre-existing 
union, which we will call “AB”, and displaces it by forming another union, that of 
the new substance “AC”, by, for example, releasing the body B into the surrounding 
environment. These tables classify the substances by affinity in the form of columns. 
The first line of a column corresponds to a reference body. Substances that react with 
the reference substance fall into that column (Holmes 1989, 1996). The results were 
obtained by observing a displacement during a contact of bodies, under the action of 
fire or a liquid agent. They allowed chemists to learn to consider, at the same time, 
several factors favoring displacements such as the proportions of the substances 
present, the duration and intensity of heating, the volatility of the substances, the 
duration of the reaction, the instruments used (oven, glass devices) and their 
geometric and material characteristics. 

To produce his table of ratios, Geoffroy undertook a synthesis, unheard of at the 
time, of the numerous experimental results obtained during the sixteenth and sev- 
enteenth centuries by metallurgists and traditional craftsmen, apothecaries who 
prepared remedies in the laboratory adjoining their dispensaries, and alchemists 
who analyzed the composition and reactivity of minerals. In doing so, he drew on 
the metallurgical treatises De Re Metallica. Libri XII by G. Agricola (1556, 
republished in Agricola 1950) and De la Pirotechnia by V. Biringuccio (1540, 
republished in Biringuccio 1943) as well as on the chemistry treatises written by 
Nicaise Le Febvre (1615-1669), Christopher Glaser (1629-1672) and the monu- 
mental Cours de Chymie by Nicolas Lémery (1645-1715). In it, Lémery describes, 
very precisely, the operations to be carried out — weighing ingredients, physical 
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preparations in the form of powder or solution, appropriate type of distillation, 
calcination, filtration, extraction with water or wine spirit, type of furnace or water 
bath, etc. — for transforming substances from the mineral, vegetable, and animal 
worlds into remedies and for characterizing and storing the final product (Lémery 
1757). These treatises describe numerous experimental protocols and involve many 
instruments, such as those from Lemery’s Cours de Chymie, which allow one to 
carry out distillation with the help of a blast furnace or a special type of water bath 
using mercury. It is the details — the shape and height of the alembic, the size, volume 
and quantity of water in the barrel that allows the vapors to condense, the nature of 
the bricks that make up the furnace, the size of the water bath and the way in which 
the hot water is poured into the mercury vat — that make it possible to extract, purify, 
displace and even produce a new substance. 

This affinity table is based on two types of experiments. The first, inherited from 
the metallurgists, consisted in implementing the displacement of a metal by another 
metal in a saline solution, using numerous instruments to extract, refine and melt 
metals (Partington 1961). The second, derived from the apothecaries, involved 
decomposing a salt by the action of an acid, a base or another salt, using reversible 
reactions that the distillations did not permit (Stillman 1960). Reversible is to be 
understood as the possibility of preparing a solution or synthesizing a new substance, 
while remaining capable of recomposing the starting reagents by means of one or 
more chemical operations. The historian Ursula Klein explains that this classification 
of substances by the displacement reactions in which they are involved made it 
possible to link two vast sets of practices that were not yet linked. This connection 
allowed chemists to learn to conceive of fire, not as a “principle”, a notion accepted 
since antiquity, but as a solvent or as an instrument, which, in the end, allowed 
chemists to gradually propose a global and operative interpretative framework for 
substances and chemical reactions (Klein and Lefevre 2007; Klein 1995). These 
affinity tables had considerable influence until the beginning of the nineteenth 
century, making it possible to produce new criteria for the identification of sub- 
stances (by extraction or by recreation in the laboratory), thanks to the instruments 
and specific know-how put in place by “table makers” such as Gellert, Bergman and 
Wenzel (Holmes and Levere 2000). Substances and reactions were interdependent. 
This interdependence cannot be detached from the set of experimental procedures 
and instruments developed by practitioners to produce, analyze and stabilize these 
substances. The deployment of so many instruments and so much technicality 
pervades the entire history of alchemy and chemistry. It is not unlike the way Ian 
Hacking defines the act of “experimenting” in science: “To experiment is to create, 
produce, adjust and stabilize phenomena” (Hacking 1983, p. 230). 

Above and beyond this example, in the work of chemists, where substances and 
reactions cannot be conceived of independently, the notion of “purity” is a condition 
of possibility for the construction of classifications and the creation of new sub- 
stances. To understand the importance of this notion, it is necessary to recall the 
cultural context in which it is embedded. In scholastic culture, art could only imitate 
nature, the gold of alchemists was a copy, and it was up to them to develop tests to 
distinguish real gold from counterfeits. The alchemists, then the chemists, thus 
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qualified themselves to be recognized as experts in the fight against fraud. It was in 
the industrial and commercial spheres that experimental protocols for verifying the 
authenticity of substances were made. This involved the measurement of their purity 
thorough characteristic tests of dissolution and other chemical transformations 
(Bensaude-Vincent 2011; Klein and Spary 2010). It is within this framework that 
the chemists designed instruments and set up numerous experimental protocols to 
obtain the degree of purity required to achieve their objectives. The distillation 
devices of Lémery previously mentioned have been replaced by other distillation 
setups used today to which are added the techniques known as “extraction”, for 
example, “liquid-liquid extraction” which consists of passing a substance from a 
solvent from which it is difficult to separate to another from which it can be isolated. 
The techniques of recrystallization played, and still play, an important role in the 
purification of crystal. They are based on the difference of solubility of the product of 
interest and the impurities in a solvent and mobilize, for example, a solvent in which 
the impurities are soluble in hot and cold, and in which the product of interest is 
soluble in hot but little soluble in cold. These are techniques in which the choice of 
filters and funnels is crucial for obtaining a good purification yield. We could also 
mention ultrafiltration, centrifugation, chromatographic and electrophoresis tech- 
niques. These techniques are now coupled with physicochemical analysis methods 
such as mass spectrometry and nuclear magnetic resonance, which contribute to the 
quantification of the purity obtained, while providing information about the spatial 
structure and composition of the substances (Reinhardt 2006). The number of 
instruments developed, techniques used, procedures to be followed, is impressive 
and makes chemistry a highly creative discipline, operative and dependent on 
devices to act on substances to purify them. 

This notion of purity therefore refers to purification processes and techniques, and 
not to an intrinsic property of the material. Purity is the result — always provisional — 
of an action, constantly to be renewed. Purification does not end up with an 
“absolute”, but results in, as the chemists themselves say, “degrees of purity”. A 
chemical substance, despite what its formula indicates, is never a substance that can 
be thought of and used in isolation, in itself, for itself. As the philosopher Joachim 
Schummer writes: 


it is only because our chemical species per definition retain their identity during purification, 
that we are able to connect single facts of chemical relations with each other to build a 
systematic network structure of chemical knowledge. [...] The resulting classification has 
turned out to be again a network structure, with substance classes as nodes and chemical 
class relations as connections; it has enormous systematizing and predictive power with 
regard to chemical properties ((Schummer 1998, p. 131, 157). 


Considering a chemical substance as isolated is useful for teaching and makes it 
possible to explain a certain number of stabilized regularities or, on the heuristic 
level, to foresee new devices, to propose new explanations. It is, in this way, a 
functionally efficient hypothesis to continue the adventure of chemical explorations. 
But, sooner or later, particular modes of action, instruments, or other substances will 
reappear in the writings and oral exchanges of chemists. No one would have been 
able to predict, for example, based on a formula alone — such as CF BrCl, or via the 
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various spatial representations of its structure, these types of formulae being varied 
in chemistry -, that chlorofluoroalkanes participated in ozone destruction (Llored 
2017). It was only when the depletion of the ozone layer was made evident (Farman 
et al. 1985) that scientists were able to establish that chlorofluoroalkane gases (then 
used, for example, in refrigerants) and methyl bromide (used, for example, in fire 
extinguishers) are dissociated in the stratosphere by ultraviolet radiation, releasing 
chlorine or bromine, which are then able to enter chemical reaction cycles leading to 
the destruction of ozone. To do this, researchers had to resort to inter-comparison 
procedures, since several experimental techniques (acid-base or oxidation-reduction 
assays, spectroscopy, chromatography, etc.), each with a different degree of preci- 
sion, are used to determine certain chemical concentrations. Moreover, they had to 
compare empirical results with those obtained by modelling and numerical simula- 
tions to estimate the reliability of the values obtained (Berthet and Renard 2013) and 
to cope with the numerous experimental difficulties linked to the involvement of 
about fifty chemical species present in the stratosphere, the low concentrations 
measured (a few molecules of chlorine per billion air molecules, of bromine per 
ten billion air molecules, and a few ozone molecules per million air molecules) and 
some important quantities that are difficult to extract from the spectral analysis 
(in particular the effective absorption sections). 

Chemists need to get their substance to react in order to observe and explain. At 
some point, this step becomes necessary depending on the degree of precision 
required in the chemical activity being discussed. This is not to say that chemists 
are incapable of making predictions. The tables and classifications that they have 
constantly produced are tools that are necessary and effective for making predic- 
tions, but they are not sufficient. Substances and reactions are indeed indispensable 
to give a detailed account of the activities of chemists, whatever they may be, and 
this in relation to the use of the instruments and processes on which these bodies and 
reactions depend. 

Given this framework, the debate among philosophers of chemistry concerning 
the nature of chemical “objects” remains open. Philosophers of chemistry ask if 
these objects are substances, molecules, materials, stuff, chemical species, or pro- 
cesses? The book Stuff: The Nature of Chemical Substances by Ruthenberg and Van 
Brakel (2008) is one of the best studies on the subject. Van Brakel has, for example, 
studied the meanings that the word “substance” takes on according to the types of 
chemical practices in which this word is used. He shows that several definitions 
coexist, including, among others, the reference, at the microscopic scale, to a 
microstructure or to a given composition, or, at the macroscopic scale, to the 
“phase rule”, according to which the change of state of a pure substance, at a 
given pressure, takes place at constant temperature. The definition of the word 
“substance” depends on the operations used to act on the substances; operations 
that mobilize this or that property more than another depending on the instruments 
and processes used. Van Brakel concludes that this notion of substance can, in the 
end, only be “pragmatic” and points out, moreover, a great number of situations 
where chemists have difficulties in telling whether a sample contains one or several 
substances (because, for example, different spatial structures may correspond to the 
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same chemical composition) (Van Brakel 2000a, b, 2012). By “pragmatic”, we 
mean the practical effects observed by all the members of a community when the 
term “substance” is used in a certain context, i.e. what researchers observe following 
their experiments in the laboratory. This is the meaning that is tacitly shared by all 
chemists and refers to the practical aspects linked to the use of the term “substance”. 
Substance cannot be conceptualized without reference to certain operations, to 
certain instruments. It is captured by description, which is reminiscent of the work 
of Gilles-Gaston Granger (1994) on the relationship between form, operations, and 
objects, even though Granger did not study chemistry. These operations respond to a 
set of reactions that allow the classification of these substances and the set of tests, 
procedures, techniques and instruments deployed to reach this objective. 
Moreover, we should not forget three important points that clarify how the objects 
of chemistry and experimentation are linked. First, chemists study composites: one 
gram of glucose contains billions of billions of glucose molecules. Moreover, the 
same substance can have several “chemical species”, due, for example, to different 
structural arrangements. Secondly, these substances are often in the form of “dis- 
persed” substances. A polymer, for example, is synthesized by linking a unit to itself 
many times through a chemical bond. The number of units within polymers varies 
according to the conditions of the experiment. Chemists must therefore evaluate the 
dispersion of this type of compound around an “average” value of the molecular 
weight. This is done by means of an analysis, for example, by chromatographic 
separation. In this way, they determine the average degree of polymerization by 
number (average number of units in the polymer chains), the average degree of 
polymerization by mass, and the “polymolecularity index” which gives an initial 
idea of the distribution of the molar masses of the various macromolecules within the 
polymer sample. Chemists must coordinate a wide range of scientific concepts, 
instruments and know-how to understand this dispersion (chromatography, osmom- 
etry, dynamic light scattering, ultracentrifugation, mass spectrometry, etc.). The 
articulation of these means makes it possible to define a distribution, in short to 
specify the multiple in the one. Chemists articulate a set of instruments and opera- 
tions and a language (‘average degrees of polymerization in number and mass”, 
“indices of polymolecularity”, etc.) to express and understand this dispersed char- 
acter of many substances. Thus, a single substance can be “composite” in 
three ways: qualitatively by the presence of several chemical species corresponding 
to a given formula which will be separated using several reversible chemical 
reactions and/or several techniques (chiral or supercritical phase chromatography, 
fractional distillation, capillary electro-phoresis, etc.), quantitatively due to the 
presence of several chemical species corresponding to a given formula), quantita- 
tively due to the dispersion, for example in number or size, of certain chemical 
species, and, finally, constitutively through the reactions without which it cannot be 
obtained, acted upon and thought of, and which could not, in turn, be conceived and 
realized without it. Thirdly, in addition to the synthesis of substances (which consists 
in producing them), or to their analysis (which consists in determining what these 
substances contain both in quality and quantity), it is necessary to add an entire 
segment of this science-industry, too often forgotten but which has always been 
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important, namely that of “formulation”. By this word, chemists mean all the 
operations involved in mixing or shaping ingredients (raw materials), often incom- 
patible with each other, in order to obtain a commercial product characterized by its 
use, as in the case of a toothpaste. What counts for a formulator-chemist is to make 
chemical bodies coexist without provoking a chemical reaction between them, which 
differentiates this activity from that of a synthetic chemist. The mixture, as such, is 
one of the objects of chemistry. In this case, as in the previous ones, a considerable 
number of chemical, rheological, colorimetric, tribological, and texturometric tests 
and analyses are carried out on a cosmetic cream in order to stabilize it and to check 
whether or not it meets the specifications, without a complete explanation of the 
properties of the mixture being theoretically accessible at the time these measure- 
ments are carried out. 

The objects of chemistry are substances (more or less pure, more or less dis- 
persed), mixtures of substances and the reactions without which these substances 
and mixtures cannot be characterized, thought and acted. Experimenting in chemis- 
try therefore means setting up technical devices that allow the production, modifi- 
cation or analysis of a substance or a mixture in an intentional, systematic and 
controlled manner, in order to isolate one or more parameters that contribute to 
achieving the desired objective. This work is inseparable from the massive presence 
of instruments of synthesis or analysis and experimental procedures for which detail 
is king. It is in this sense, it seems to us, that Bachelard wrote: 


Indeed, whereas the subordination of the attributes to the substances may remain the ideal of 
an ontological science which believes at the same time in the productive power of the 
substance and in the deductive power of knowledge, it is necessary to come to the coordi- 
nation of the attributes between them, and then to the coordination of the substances between 
them, when we want to seize the chemical experience as essentially correlative, as well as 
seize the theoretical thought as essentially inductive (Bachelard 1973, p. 26). 


We will now discuss the dependence of substances on the environments in which 
they can be found and the implications of this dependence for the conduct of 
experiments. 


3.2 Experimenting in a Situation of Dependence 
on Environments 


3.2.1 Dependence on Environments of Substances 
and Chemical Transformations 


Alchemists, and their successors, the chemists, have always known that the same 
substance reacts differently according to the liquid medium, the medical remedy, or 
the mixture in which it is found and acts. Contemporary nanochemists know, for 
example, that zinc oxide, with the formula ZnO, has different structures, reactivity, 
and toxicity depending on the acidity of the medium in which it is found (Aimable 
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et al. 2010). They also know that the physico-chemical behavior of the medium in 
which these crystals precipitate differs due to the presence and nature of the 
precipitated zinc oxide. Moreover, for the same chemical composition and identical 
proportions, nanochemists also know that the structure of a precipitated body 
depends on the size of its crystals, which in turn depends on the medium and the 
way the experiment is conducted. For example, a zinc oxide crystal 0.5 nm in 
diameter does not have the same internal structure as a crystal of the same oxide 
0.9 nm in diameter in a given medium due to surface phenomena. The structure is no 
longer a characteristic that the substance has independently of other substances. It 
depends on the size of the substance, the instruments used, the process involved and 
the reaction medium. Thus, crystals of different size and therefore of different 
internal structure can very well coexist within the same sample, which we have 
previously referred to as a “dispersion’’. 

The substance alone is not the real object of study for chemists. What chemists 
have always studied, and what they are increasingly studying, given the new means 
of action available to them, is the set of regularities associated with a substance in a 
reaction medium, a solvent, a mixture of solvents or a mineral or vegetable matrix. 
The medium is not just a passive thing from which the substance can be detached; on 
the contrary, it participates, constitutively, in what the substance can do, and 
reciprocally, the substance participates in what the medium can do, in what charac- 
terizes it, knowing that the substance and the reactional medium are different and 
have different capacities. In the same way — and this has been known by organic 
chemists since the nineteenth century (Tomic 2010) and is becoming even more 
important in contemporary chemical practices — the reactions depend on this medium 
and the medium on these reactions. This is particularly the case when chemists use 
“complex fluids”! whose characteristics depend on their history, on the constraints 
applied to them, on the interfaces involved, and on the multiple interactions between 
chemical substances, taking into account, moreover, that the latter are often placed in 
conditions of metastability,~ which amplifies their interaction with the chemical 
substances and reinforces the mutual dependence of the substances and of the 
chemical transformations in a given medium. This interdependence is, once again, 
deeply dependent on the presence of instruments of synthesis and analysis. Indeed, 
this type of chemistry is made possible by the development of novel controlled 
mixing reactors (microfluidics® in liquid phase, laser pyrolysis in gas phase). As an 


‘Complex fluids are binary mixtures with two coexisting phases: solid-liquid (suspensions or 
solutions containing macromolecules such as polymers or giant micelles), solid-gas (granular 
media), liquid-gas (foams) and liquid-liquid (emulsions) whose viscosity varies with the stress 
applied to them and/or their “previous states”. 

>Metastability is an attribute of a system that is not energetically (thermodynamically) stable, but 
which appears to be so because of a very low transformation rate at our scale. 

>The chemist George Whitesides defines microfluidics as “the science and technology of systems 
that manipulate small volumes of fluids (10-9 to 10-18 liters), using channels the size of tens of 
micrometers.” Miniaturized systems have made it possible to better control the germination stage 
and to improve exchanges by limiting heterogeneities in the reactor volume (Whitesides 2006). 
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example, segmented flow tubular reactors equipped with micro-mixers allow for a 
thorough and efficient mixing of reactive solutions, in very short times of the order of 
10 to 20 milliseconds (Aimable et al. 2011). Subsequently, through segmentation, 
particles confined in small-volume droplets can grow more homogeneously during 
the journey inside tubes of very fine diameters, thus controlling size dispersion 
(Marre and Jensen 2010). 

This new chemistry integrates precipitation techniques from so-called “soft” 
chemistry (Livages 1977) allowing chemical reactions to be carried out at room 
temperature and mostly in water, which was impossible only 40 years ago, and in 
situ analysis techniques such as near-field optical microscopy, various types of X-ray 
scattering, and advanced atomic force microscopy. The “hyper-technicity” of chem- 
istry is taking a new form and making the dependence of substances and chemical 
reactions ever stronger. 

In short, a substance endowed with the capacity to act cannot be detached from its 
environment and the instruments and processes of synthesis. On the one hand, the 
substance must be understood in connection with the operations implemented to 
produce it to a provisional degree of purity; on the other hand, it must be character- 
ized by what it does due to the reactions in which it is involved in given environ- 
ments, all of which mobilizes an instrumentation often endowed with a high degree 
of technicality. Finally, these substances are “dispersed”. This dependence on the 
environment implies that the definition of a substance is always open and provi- 
sional, in the sense that other ways of acting on the body will make it possible to 
identify and stabilize other behaviors, other properties. The chemical substance is the 
origin or the result of an action on other substances through reactions whose 
characteristics depend on the environments in which these transformations take 
place and on the technical characteristics of the instruments used during these 
reactions. The substance thus refers to the set of procedural rules that every chemist, 
as an agent, must follow in order to form, purify or identify the latter and to make it 
react. This finding, which is the result of an investigation of the daily practices of 
chemists, is not unlike, in some respects, the definition of lithium proposed by 
Peirce, using a semiotic approach not based on the study of chemists’ practices. 
For Peirce, the definition of the word lithium refers to the sequence of actions to be 
performed on and with lithium, as well as to the series of reaction media to be used, 
by any chemist, to produce it once again, provided that certain well-defined exper- 
imental procedures are respected (Peirce 1931-1958, §330). Now, this dependence 
on media has consequences for the way chemists carry out experiments; a point that 
we will discuss before concluding. 


3.2.2 The Difficulties of Experimentation Linked 
to the Dependence on the Environment 


In analytical chemistry, the term “matrix” refers to the substrate, i.e., the medium in 
which the molecules to be characterized and measured are found (biological fluid, 
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food, plant, river water, etc.). It is generally a very heterogeneous medium in which 
the molecules to be studied interact with the surrounding molecules in many ways. 
For example, the determination of endocrine disruptors in river sediments poses 
several problems, including the absence of a “white” matrix and a representative 
reference matrix. It is impossible to have a “white” river sediment, i.e., one that is 
certain to be free of endocrine disruptors, since this type of pollution affects the 
entire surface of the Earth, including polar ice caps. It is also impossible to presup- 
pose that a sediment extracted from a given river can be representative of sediments 
from other rivers, as the granulometry and composition of organic and mineral 
matter vary considerably from one river to another and even between two sites in 
the same river. 

The solution adopted by most accredited laboratories for this type of analysis is to 
use a reference matrix, which is a sediment “artificially” reconstituted from, among 
other ingredients, peat and sand in standard proportions (Bouchonnet and Kinani 
2013). This reference matrix is therefore a model of the environment studied; a 
model that has the advantage of standardizing the validation of the method between 
different laboratories, and thus allowing for the comparison of results between them. 
However, the criticism that can be addressed to the reconstituted sediment is that it 
does not correspond to any “real” sediment actually present in the river. Undeniably, 
its composition does not correspond to that of a humus-rich sediment or to a highly 
sandy sediment. Consequently, although this practical solution makes it possible to 
compare the results obtained by different laboratories and to settle disputes between 
experts, it remains unsatisfactory in terms of the representativeness of the reference 
matrix and is therefore not entirely adequate for the establishment of a model aimed 
at establishing a satisfactory correlation in a given empirical context, in particular in 
the context of the ecotoxicological study of a body. Moreover, creating a contam- 
inated sediment with optimal interactions between pollutants and sediment constit- 
uents requires “mimicking” the natural conditions of contamination of the latter. 
Thus, the water column in contact with the sediment must be boosted, agitated, 
aerated, and allowed time for the interactions to take place, the experimental set-up 
being itself a model of what chemists know about these bodies at a given moment. 

A model, philosophers of science tell us, despite their different points of view, is a 
structure, a fictitious object that generally fits into a theory, and that attempts to 
simplify, to idealize a phenomenon and that aims to allow predictions, explanations 
or observations within a field of application. Let us keep in mind the Latin etymol- 
ogy of the word model, modus, which means measure and is meant to represent a 
certain mediation with the external world (Harré and Llored 2019). 

Modeling is not only about the mathematical processing of the experimental 
results, but also about its practical realization, as shown above by the fabrication 
of the reference matrix for the quantification of endocrine inhibitors. Furthermore, 
the major problem in the preparation of a contaminated sediment is that the yield of 
extraction is always less than 100%. Determining the concentration of the substance 
under study by mass balance is incorrect, as part of the material is systematically lost 
through transformation, degradation or evaporation during the sediment preparation 
protocol, which limits the accuracy of the results. 
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In addition, there is always a “matrix effect’ because the different constituents of 
the matrix substrate interact with the analyzed substances during the sample prepa- 
ration phase and/or during the physicochemical analysis, sometimes modifying their 
chemical reactivity. When the constitution of the tissue is modified, these interac- 
tions are also modified. Thus, an analytical method is rigorously applicable only to 
the matrix for which it has been validated. A laboratory dosing endocrine disruptors 
in a river, or pesticides in plants, would therefore have to have validated a method for 
each type of site or for each variety of plant with which it is confronted. Not only is 
this technically extremely tedious, but it is not compatible with the specifications of 
an analytical laboratory, in terms of both time and cost. Thus, laboratories generally 
use the same method to analyze endocrine disruptors from several rivers or to 
analyze pesticides from different salads (Bouchonnet and Kinani 2013). 

In our example of the determination of endocrine disruptors, and this remains true 
for any determination from the point of view of the approach to be adopted, 
the quality of the analytical results, in addition to the training and experience of 
the chemists doing this work, rests essentially on the validity of the methods and the 
reliability of the apparatus. The purpose of qualifying an apparatus is to establish that 
it is suitable for its purpose and that it is maintained and calibrated in an appropriate 
manner to continue to perform this function throughout the period of its use in the 
laboratory. Once the device is qualified, the validation of the chosen method consists 
of studying its “performance” according to certain criteria (specificity/selectivity, 
accuracy, precision, linearity, limits of detection and quantification, application 
interval, robustness, repeatability, reproducibility) which mobilize many tests and 
statistical processing. Validation is the set of operations necessary to establish that 
the protocol is sufficiently accurate and reliable to have confidence in the results 
provided and for a given use. Subsequently, it will be necessary to ensure that these 
performances are maintained during the routine application of the method, by using 
conformity tests relating to the apparatus and the methods used and quality control 
samples (reference body, matrix and associated medium). In short, the purpose of 
these tests is to demonstrate that the complex including the instrument, the methods 
used, the reference substances and matrices, as well as the associated medium 
(solvent mixture), corresponds to the requirements set for the measurement. Change 
a “detail”, the substance to be analyzed (nature and/or quantity), the medium, the 
method or the apparatus — any single factor — and everything changes! It is then 
necessary to start all over again, to requalify the apparatus, revalidate the method, 
redo tests of conformity on the samples in order to stabilize a field of measurement 
for which the use of what we designate the “complex {apparatus-methods-sub- 
stances-medium associated }” leads to reliable and interpretable results. 

The complex {apparatus-methods-substances-medium associated} studied is 
indivisible because the four elements are mutually adapted to each other, thus 
constituting the complex. The operating conditions of the apparatus and the methods 
that we associate with it depend on the substance studied whose condition, texture, 
granulometry or concentration, depend, in turn, on this apparatus for a particular 
associated medium and in relation to a determined reference matrix. Chemists must 
couple analytical methods (mass spectrometry and high-performance liquid 
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chromatography for example) and cross-reference multiple information in order to 
establish sets of co-stabilized results. In this way, they establish networks of relation- 
ships and quantifications that allow substances to be ranked against each other, for 
example in toxicological terms, taking into account their matrix origin, instrumen- 
tation, extraction processes, statistical methods, and current standards (Llored 2015, 
2021). The robustness of the results of the {instrument-methods-chemical bodies- 
associated environment} complex is an issue and is in no way, for an engineer, 
technician or researcher, a simple “initial given”: without stabilization, without 
“coadaptation” of the different elements, no factor can vary when the others remain 
constant! The verb “vary” takes on its full meaning here in relation to the stabiliza- 
tion of the “response” of the complex because, in this context of active materials, of 
substances whose reactivity depends on the environments, several changes inevita- 
bly take place simultaneously along a chain of transformation. It remains however 
possible to stabilize globally these changes, to contain them, in order to quantify 
chemical substances and to compare the results obtained, i.e., to include a measure- 
ment result in an interval with a given percentage of risk of error. Such is the 
challenge of experimentation, with a quantitative aim, in chemistry, or in any field 
where the sensitivity to the environments of the objects studied becomes non 
negligible because of the fixed objectives. 


3.3. Conclusion 


Experimentation in chemistry is inseparable from the poly-categorical ontology that 
this science-industry expresses, and which concerns not simply substances or pro- 
cesses in isolation, but also substances and reactions that must be understood 
through analogy and the identification of differences. Chemistry comprises a set of 
classifications, methods of analysis or synthesis of existing materials and possible 
materials to be created. The dependence of substances on the medium implies 
numerous instruments, knowledge, and know-how that chemists have always 
known how to implement, regardless of whether they have an explanatory theory 
at a particular moment in the history of chemistry. The operational definition of 
substance implies an extremely strong contribution from experimentation and the 
multiplication of approaches that are much more descriptive and inductive than 
hypothetico-deductive, which does not prevent chemists from proposing theories, 
the revision of which will very often be done in a pragmatic way, and which present 
a strong heuristic and explanatory power, mobilizing, depending on the case, causes 
or correlations, reductionist or holistic approaches (Stengers and Bensaude- Vincent 
2001). The results that this science-industry obtains in synthesis, mixtures and 
analysis depend on an instrumentation whose technicality has always been at the 
heart of alchemical and chemical practices. As a science of the useful and the 
detailed, the study of chemistry allows us to better understand the relationship 
between technique, science, and technology, between discovery and invention. 
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Chapter 4 ®) 
Experimentation in the Life Sciences se 


Laurent Loison 


The scientific revolution, which began in the early seventeenth century, was above 
all a revolution in the scientific method. Experimentation and quantification replaced 
the authority of Ancient writings. In the field of life sciences, during the second half 
of the eighteenth century “natural history”, an essentially descriptive undertaking, 
gave way to the “natural sciences”, more openly explanatory and etiological. Certain 
branches of “biology”! became experimental disciplines the following century. As 
can be seen from this chronology, the appropriation of the experimental method by 
the life sciences was a lengthy and still ongoing process primarily because it was a 
thwarted process. 

The aim of this chapter is to give an account of the obstacles that have made it 
difficult to implement a methodology modelled on physics and chemistry. These 
obstacles were all due to the tension that arises, just as in the history of science today, 
between the specific characteristics of life and the requirements of a mechanistic 
science. In this regard, elements of Bergson’s philosophy (Bergson 1907) remain 
current: the living cannot be easily grasped with an analytical mechanistic interpre- 
tative framework, which enables the use of an experimental and quantitative meth- 
odology. Although there is no incompatibility between science and life, as Bergson 
proclaimed, frictions remain between the ontology of the living and the epistemol- 
ogy of the experimental sciences. These are the consequence of the fact that life 
exists only in the form of individuals, of organisms that have the dual particularity of 
being forcibly integrated totalities and singular objects. It is thus always dangerous 
to reduce the functioning of a living being to that of its constituent parts since it is 
difficult to overcome the uniqueness of each individual living being. Therefore, 
and the entire history of biology bears witness to this, it has been necessary to 


‘Insofar as it designates the specific science of vital phenomena, this term first appears in the 
scientific literature around 1800 (Caron 1988). 
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systematically produce concepts that, to be fully operative, must respect the speci- 
ficity of living organisms and make experimentation possible — a particularly 
difficult articulation to undertake. It will come as no surprise that experimentation 
was first introduced into biology through the elucidation of the most mechanical 
functions, the first and foremost of which was the circulation of the blood. 

The problem of experimentation in biology is therefore not only the problem of 
developing adequate equipment, nor even that of quantifying fleeting phenomena, 
but more fundamentally that of the difficulty of developing a theoretical framework 
capable of generating testable hypotheses. Following this line of interpretation, the 
progress of biological theory has led to advances in the use of the experimental 
method. This progress has been made essentially along two lines, that of an 
increased intelligibility of the hierarchical levels within organisms (the organism 
as a totality), and that of the inscription of life within a historical framework (the 
organism as a singular object). The first line of thought is the subject of the first three 
sections of this chapter. They will show how, thanks to a shift in the mechanistic 
focus, it has been possible to carry out controlled experimental investigations at ever 
lower levels of scale within organisms. The second line of thought, that of a 
hypothetico-deductive form appropriate to the theory of evolution, is characterized 
in its broad outlines in the fourth and last section. In this instance, it will be a 
question of the epistemology of the historical sciences, the “sciences of past causes’’, 
and of the specificity of this type of reasoning with respect to classical experimental 
methodology. 


4.1 Experimenting in the Classical Age. The Demonstration 
of Blood Circulation 


In secondary and post-secondary school textbooks, the “discovery” of the circulation 
of the blood by William Harvey in the seventeenth century generally occupies a 
prominent place in the introduction to chapters dealing with animal physiology. A 
spontaneous positivism tends to portray this episode exclusively in terms of a victory 
of experimental methodology over the obscurantism of medieval scholasticism. 
Quantitative reasoning is often the only aspect of the English physician’s work 
worthy of mention. Without obviously denying the central role of quantification, a 
comprehensive understanding of this history cannot ignore Harvey’s own project or 
the interpretative framework which enabled him to succeed. 

Recall here the considerable success of the Galenic explanation of the movement 
of the humours. From Galen to Harvey, the movement of blood liquids (venous and 
arterial blood) was understood as a double unidirectional flow, from the liver (for 
venous blood) or from the heart (for arterial blood) to the organs that consumed 
them. While there was indeed movement, there was no circulation because the 
liquids never returned to their starting point. 
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Trained at the University of Padua, Harvey’s original project was not to challenge 
but rather to authentically recover — by stripping it of its scholastic coating — the 
ideas of the Ancients, first and foremost Galen and even more so Aristotle (Grmek 
1990, p. 103). When he finally came to oppose Galen, he turned to Aristotle to 
explain how the blood returned from the organs to the circulation, although what 
would later be called the blood capillaries remained unknown to him (Gingras et al. 
2000, p. 317). More fundamentally still, Harvey sought to give a key role in his 
demonstration to the Aristotelian analogy of the heart as the sun of the microcosm 
constituted by the organism: “The heart is thus the principle of life and the sun of the 
microcosm, just as the sun could be called the heart of the world” (Harvey 1628, 
quoted in Grmek 1990, p. 106). 

There is, therefore, as much speculation in Harvey’s work as in Galen’s or the 
medieval authors. What makes his approach original is the articulation of his 
theoretical intentions and the use of the hypothetico-deductive method and quanti- 
tative reasoning that, until then, had been excluded from the field of physiology. 
Throughout his work published in 1628, the famous De motu cordis, Harvey 
explained the hypotheses he tested and, above all, considered the observable conse- 
quences to which they led. Thus, by a relatively easy calculation, Harvey showed 
that the quantity of blood that the heart propels into the aorta in 30 min (through, in 
his estimation, at least one thousand contractions) exceeded by far the total quantity 
of blood in the body. In addition to this quantification, Harvey carried out a series of 
experiments manipulating the venous valves which showed that the venous system 
led the blood from the periphery to the heart, and that the valves had the function of 
preventing blood reflux. 

Finally, based on a series of vivisections, he interpreted the functioning of the 
heart as a mechanical pump, the engine of blood circulation throughout the entire 
body. Harvey thus developed an explicitly hypothetical-deductive reasoning, based 
on anatomical, quantitative, and sometimes experimental data (as in the case of the 
elucidation of the function of valves). According to Grmek (1990, p. 110), however, 
his work did not immediately convince his contemporaries because his Aristotelian 
interpretation of the looping back of the circulation between the arterial and venous 
systems appeared too speculative and fragile. Harvey’s success was appreciated and 
recognized only in retrospect, especially when microscopic observation revealed the 
existence of blood capillaries. 

As for the relationship between the theoretical framework and experimental 
success, it is, in our opinion, Francois Jacob who offered most the relevant reading 
of it: 

It is often said that by showing the analogy of the heart with a pump and that of the 

circulation with a hydraulic system, Harvey contributed to the installation of the mechanism 

in the living world. But this reverses the order of the factors. In reality, it is because the heart 

functions as a pump that it is accessible to study. It is because circulation can be analyzed in 

terms of volume, flow, and speed, that Harvey was able to conduct experiments with blood 
similar to those Galileo conducted with stones. For when the same Harvey tackles the 


problem of generation, which is not a matter of this form of mechanism, he can get nothing 
out of it (Jacob 1970, pp. 43-44). 
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The macroscopic mechanism of the classical age, applied to the animal machine, 
thus offered limited experimental perspectives, since only the circulatory function 
finally proved to be comprehensible in strictly mechanical terms. When one tried to 
account for the other key functions proper to living organisms in the same terms, as 
Harvey himself tried to do for generation, this approach turned out to be much less 
fruitful. It provided no basis for experimentation. 


4.2 The “Elasticity” of the Vital, the Concept of the Internal 
Milieu and the Constitution of Physiology as an 
Experimental Science 


When, at the end of the eighteenth century, Lavoisier identified respiration with 
combustion, he was, to a certain extent, following Harvey. He reduced a biological 
function to a physico-chemical phenomenon, and the (relative) validity of this 
reduction authorized the deployment of a hypothetical-deductive approach calling 
for quantification and experimentation. The successes of these strictly mechanical or 
chemical explanations were nonetheless tightly circumscribed victories. The func- 
tions of living organisms generally escape this reductionism and various forms of 
vitalism emerged during the eighteenth century in reaction to what was often 
perceived as a negation of the specificity of life. 

Living organisms, especially animals, in virtue of the spontaneity of their devel- 
opment and functioning, seemed to far exceed the constrictive framework of an 
exclusively mechanical intelligibility, even when enriched by the new chemistry. 
The natural sciences and medicine were caught in a difficult position, torn between 
experimental aspirations which seemed to require a mechanistic program, and the 
risk of distorting their object by reducing it to a simple physico-chemical 
mechanism. 

In the German-speaking countries, the romantic and speculative period of 
Naturphilosophie gave way, from the 1830s onwards, to a reformulation of the 
mechanistic project that trended very clearly towards an assumed and claimed 
physico-chemical reductionism. The laboratory succeeded the natural history cabinet 
or the studies in the “field” as the privileged place for the development of knowledge 
by means of an experimental methodology intended to be identical to that of the 
sciences of matter. The constitution of the modern laboratory was the result of the 
massive and systematic introduction of research instruments capable of carefully 
exploring organisms and quantifying the phenomena associated with their function- 
ing. Canguilhem was right to emphasize “that between the physiological experi- 
mentation of the eighteenth century and that of the nineteenth, the radical difference 
lies in the systematic use by the latter of all the instruments and apparatus that the 
rapidly developing physical and chemical sciences enabled it to adopt, adapt or 
construct, both for the detection and for the measurement of phenomena” 
(Canguilhem 2002, pp. 231-232). This physiological experimentalism was, for the 
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leaders of this school such as Carl Ludwig (inventor of the kymograph) or the 
electro-physiologist Emil du Bois-Reymond, closely linked to an unreserved onto- 
logical reductionism. 

Other research traditions were more interested in carving an epistemic space for a 
science of vital phenomena. Thus, concurrently in France, Claude Bernard and his 
students developed an experimental physiology that, although founded on the 
principle of determinism (the same causes produce the same effects), did not 
subordinate living things to the variables of their “cosmic environments”, like simple 
mechanical automata. The formation of the concept of milieu interieur allowed 
Claude Bernard to articulate the methodological requirement of determinism with 
the apparent spontaneity of (higher) animals. Because of the existence of the milieu 
interieur, a buffered and actively regulated zone between the exterior and the cells of 
the body, researchers were obliged to undertake controlled modifications of the 
tissues themselves if they sought to reveal the mechanism of an organic phenome- 
non. Opposed to the idea, inherited from Lavoisier, that a simple balance between 
inputs and outputs was sufficient to characterize actions within organisms, Bernard 
sought to penetrate the tissues to grasp the special processes of vital phenomena. He 
challenged, for example, the traditional idea that plants were the only organisms 
capable of producing sugars that were then consumed by animals. The mastery of a 
proven vivisectionist methodology, combined with the art of hypothetico-deductive 
reasoning, allowed Claude Bernard to demonstrate the synthesis of sugars in the liver 
(experiments that have remained famous under the generic name of the “washed 
liver” experiment). 

In 1865, Claude Bernard published his most famous work, the Introduction to the 
Study of Experimental Medicine, in which he retrospectively identified the canons of 
the method that he had practiced for more than two decades. He advocated for an 
active and interventionist experimental science, as opposed to the “contemplative” 
sciences based only on observation, such as zoology, botany, or nosography. 
Following in the footsteps of his colleague Eugéne Chevreul, Claude Bernard first 
used the scheme known as “experimental reasoning” (the title of the first part of the 
book) — today called the hypothetico-deductive method. Experimentation differs 
from experimental reasoning in that the experimenter rigorously controls the quan- 
titative variation of at least one causal parameter involved in the production of the 
phenomenon. This capacity for control allows the experimenter to set up control 
experiments that Claude Bernard refined through the concept of “counter-test’” 
(Bernard 1984, p.91) and “comparative experiment” (ibid., p.183). 

As is widely recognized, in the first part of his book Bernard does little more than 
repeat precepts that were by then well established and widely known.’ Recall that the 
nineteenth century saw the birth of the philosophy of science (the term and the 
concept), which was at first almost exclusively a methodical reflection on the pro- 
cedures of science, expressed in the works of Auguste Comte, William Whewell, 
and John Stuart Mill. As we learn in the second and third parts of his book, the 


? As Bernard himself recognizes in the introduction (p. 27-28). 
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originality of the Bernard’s thought resides in the way that it approaches the question 
of the difficulties specific to the application of the experimental method in the field of 
vital phenomena.” The difficulties were essentially of two kinds. The first arose from 
the existence of a milieu interieur that offered the illusion of the spontaneity of 
organisms with respect to their external environment. The second arose from the 
marked individual variations between organisms of the same species. It was essen- 
tially the first of these two obstacles that occupied Bernard, devoted to developing 
experimental physiology as a science of the milieu interieur. As for inter-individual 
variations, it was not until the development of robust statistical methods, at the end 
of the nineteenth century and throughout the twentieth century, that such variations 
no longer blocked the development of experimentation in medicine, ecology, or the 
theory of evolution. Biology then moved, sometimes with great difficulty, from a 
typological science to a population science for which variation is essential, and 
which foregrounds the concepts of population, mean, standard deviation and 
variance. 


4.3 Mechanism Reivented: Experimentally Dismantling 
Molecular Mechanics 


Starting in the middle of the nineteenth century, experimental physiology very 
quickly became cellular physiology. The cell was then understood as the minimal 
living being, the atom of biology. As such, it was not possible to go beyond the cell — 
and quite often even the tissues — in terms of experimentation. The hormone concept, 
created at the turn of the twentieth century, briefly permitted a renewal of the 
methods of physiological investigation. In the emerging field of endocrinology, 
ablation/grafting/injection protocols were established that made it possible to dem- 
onstrate the causal role of selected chemical molecules in animals. 

Around 1900, the research front in biology moved first to microbiology and then 
to genetics. As it is well known, microbiology was constituted thanks to a series of 
founding experiments, the most prominent of which were those of Louis Pasteur that 
showed — insofar as one can demonstrate non-existence — the impossibility or at least 
the immense improbability of spontaneous generation in nature. Microbiological 
experimentation, as practiced for instance by Robert Koch, took advantage of the 
ability to “cultivate” many microorganisms in media of limited volume. One could 
thus easily multiply results by varying the culture conditions as much as necessary. 
Thanks to such experimental possibilities, the first vaccines were produced. 

Genetics, during the first half of its existence (1900-1950), was not strictly 
speaking an experimental science. Its “laws” (commonly called Mendel’s Laws) 
were Statistical regularities based on the results of crosses of organisms with defined 


>The second part is entitled “On experimentation with living beings”, the third “Applications of the 
experimental method to the study of the phenomena of life”. 
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characteristics, and not on the active manipulation of the causal parameters of an 
experimental system. However, this new biological discipline introduced a small 
revolution in the field of life sciences: for the first time, it was a concept — that of the 
gene — that structured the discourse and oriented the empirical work. Genetics 
approached the standards of physics and very clearly claimed this epistemological 
ambition (Morgan 1926). The rise of genetics was also the consequence of the 
systematic use of “model organisms”,* a methodology that made it possible to free 
researchers from the problem of the singularity of living beings. The choice of 
Drosophila melanogaster by Thomas H. Morgan’s laboratory was particularly for- 
tunate. The “fruit fly” can be easily cultivated in test tubes, reproduces very quickly, 
and has only 4 pairs of chromosomes. Between 1910 and 1915, Morgan and his 
collaborators demonstrated, based on the results of the crosses made, that Mendelian 
genes were discrete entities aligned one after the other along the chromosomes. 

Because it was not an experimental practice in the physiological sense of the term, 
“classical” genetics, based on the demonstration of statistical regularities in the 
distribution of observable characteristics over successive generations, remained 
powerless to answer the question of the causal mode of action of genes. The 
elucidation of this question required the importation of experimentation into the 
cell, which required a new analytical mechanistic framework. This is exactly what 
molecular biology did at the beginning of the 1950s. Developed at the junction of 
genetics and microbiology (Morange 1994), the so-called “Fluctuation Test” of 
Salvador Luria and Max Delbriick showed that bacterial cells also possessed genes 
and that, consequently, bacteria were not Lamarckian organisms. This experiment is 
probably one of the most famous experiments of the golden age of molecular biology 
(Luria and Delbriick 1943). Schematically, it examined the consequences of intro- 
ducing a bacterial virus — a bacteriophage — into different bacterial cultures. 
According to the Lamarckian hypothesis, the introduction of the bacteriophage 
would induce bacterial resistance. This would have led to a uniform distribution of 
resistant colonies after the introduction of the bacteriophage in the different culture 
media. Conversely, according to the genetic and Darwinian hypothesis, the acquisi- 
tion of resistance by bacteria is a random phenomenon, unrelated to the gain in 
adaptation that may be conferred and therefore independent of the presence of the 
bacteriophage. One would expect, therefore, a more heterogeneous distribution 
according to the mutational history of each bacterial colony, and this was what the 
observed results showed. 

The molecular role of genes was definitively elucidated by Francois Jacob and 
Jacques Monod some 20 years later when they jointly published the operon model 
(Jacob and Monod 1961). A key step in this process was a series of experiments they 


“Claude Bernard’s dogs and frogs were followed in the twentieth century by many organisms, the 
fruit fly or drosophila (genetics), the Escherichia coli bacterium (molecular biology), and the mouse 
(developmental biology), being the most famous. The history of biology in the twentieth century 
has been so profoundly shaped by these model organisms that they themselves have been a 
privileged prism of historical investigation for the last 30 years. On Drosophila and classical 
genetics, see Kohler (1994). 


42 L. Loison 


carried out in 1958 with their American colleague Arthur Pardee, and which are still 
known as the “PaJaMo” (Pardee et al. 1959) experiments (or also “Pyjama’’, because 
of the near equivalence of the two pronunciations in English). Briefly, these exper- 
iments, which once again took advantage of the ability to cultivate many micro- 
organisms in parallel, consisted of observing the kinetics of the production of an 
enzyme as a function of the genetic material exchanged by the cells. The results 
obtained, which surprised the researchers themselves, revealed a regulation of the 
activity by negatively functioning genes, i.e. by the removal of an inhibitor. Their 
definitive interpretation also necessitated the postulation of a cytoplasmic interme- 
diary between nuclear genes and proteins, which would later be called “messenger 
RNA”. This made it possible to understand how the genetic information contained in 
the DNA could, in the cytoplasm, lead to the sequential assembly of the amino acids 
of proteins. 

As Jacob himself pointed out, molecular biology corresponded to “a new age of 
mechanism” (Jacob 1970, p. 17), in the sense that the cell was no longer considered 
an indivisible atom, but rather a (complex) mechanism whose workings one can 
dismantle experimentally by taking advantage of new techniques from physics and 
chemistry (X-ray diffraction, electron microscopy, ultracentrifugation, etc.). In 
return, the development of molecular biology has produced numerous tools for 
manipulating living organisms, which are now the basis of modern biology: PCR, 
the molecular scissors of the CRISPR-Cas9 system, etc. 


4.4 Two Types of Biology, Two Forms of Experimental 
Reasoning: Historical and Evolutionary Reasoning 
as a Hypothetico-Deductive Methodology 


In 1961, the ornithologist Ernst Mayr, one of the founders of modern evolutionary 
theory, published a famous paper in Science in which he distinguished between two 
major forms of biology: an experimental biology seeking the “proximate causes” of 
phenomena, and a historical biology devoted to the understanding of “ultimate 
causes” (Mayr 1961). In so doing, Mayr revived the ancient Aristotelian distinction 
between efficient and final causes, and thus sought to give full scope to historical and 
evolutionary reasoning at a time when the triumphant molecular biology sought to 
reshape the epistemology of life sciences. 

Evolutionary biology is a historical science in the Whewell’s sense, i.e. a science 
of past causes. Because it constructs a posteriori explanations for single events, it 
cannot resort to experimentation like physiology, endocrinology or molecular biol- 
ogy. Indeed, what the theory of evolution by natural selection seeks to explain is the 
singular evolutionary path that gradually led to the genesis of adaptations of living 
organisms (past or present). The reconstitution of this path requires the elaboration 
of robust scenarios concerning a sequence of past phenomena. The biologist’s 
approach here resembles more the methodical investigation of the historian than 
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the experimental investigation of the physicist or chemist. It is a question of 
experimental reasoning — in the sense of Chevreul or Claude Bernard, but not of 
experimentation. This is what Charles Darwin himself had already clearly under- 
stood. The Origin of Species (1859) thus makes systematic use of hypothetical- 
deductive reasoning based on necessarily indirect arguments. Throughout his work, 
Darwin deployed data from biogeography, paleontology, animal husbandry, or 
comparative anatomy to test the deductions drawn from his central hypothesis. He 
sought to show that the whole of natural history could thus be reinterpreted in the 
light of a single hypothesis, that of evolution by natural selection (Gayon 1992). 

This non-experimental character of Darwinian theory, in which the theory is 
under-determined by the facts, became the target of criticism by French Bernardo- 
Pasteurian biology (Loison 2010). Even today, the crucial place that history plays in 
the explanation of living systems circumscribes the field of applicability of experi- 
mental investigation. During the twentieth century, however, this field has been 
significantly enlarged by technical and conceptual advances that have made it 
possible to overcome certain obstacles that Darwin and the first evolutionists rightly 
considered as impossibilities in principle. 

The last 30 years or so have seen the rise of what is commonly called “experi- 
mental evolution”,” i.e. a sub-discipline that aims to reproduce in the laboratory, 
under controlled conditions, specific stages of an evolutionary process in order to 
explore its rules of operation. The first consistent attempts in this direction involved 
cultures of microorganisms by, for example, Richard Lenski’s group.° Begun on the 
24th of February 1988 and still in progress, this program aims to follow the 
uninterrupted evolution of different strains of Escherichia coli over tens of thou- 
sands of generations. Without denying the obvious interest of such an approach, it is 
not strictly speaking an experiment, in the sense that no parameter of the system is 
voluntarily modified to test a causal hypothesis. It is more a question of reproducing 
a segment of evolution in the laboratory and carefully observing its genetic and 
epigenetic aspects. Other protocols are more ambitious and wish to test, for example, 
the response of experimental populations with controlled genetic characteristics to 
selection. From the point of view of its epistemology, experimental evolution lies at 
the intersection of theoretical biology, model biology and experimental biology in 
the strict sense. 


4.5 Conclusion 


The experimental character of the life sciences has thus been a conquest, constantly 
renewed, depending on the disciplines considered and the historical moment. Like 
the other empirical sciences, biology has had to establish its own relationship with 


5For a synthesis in French: Thomas et al. (2016). 
°http://myxo.css.msu.edu/ecoli/index.html. 
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the world, based on concepts that define its epistemological identity, such as the 
milieu interieur, natural selection or the gene. Unlike physics or chemistry, biology 
cannot do without the individuality of its elemental entity, living organisms, which 
consistently renders mechanistic experimentation perilous and difficult, as this 
chapter has sought to illustrate. The individuality of living organisms is always on 
the brink of overflowing the theoretical apparatus of biology, and thus of thwarting 
its experimental ambition. 

Contrary to physics, theory continues to occupy a relatively marginal place in 
biology. It is rare in the life sciences that a problem is posed for theoretical reasons. 
Instead, empirical data and practical considerations continue to drive research 
(so-called “data-driven” research). The result today is a situation that, in a way, 
reproduces what natural history experienced in the eighteenth century: in the era of 
Big Data, biology has never had so much data about living organisms at its disposal, 
but it still struggles to organize it within a general theoretical framework. It is in a 
way overwhelmed by the exponential expansion of its empirical base. As has been 
the case regularly throughout its history, it is the lack of theory that limits the 
experimental character of the disciplines of biology. Empiricism, albeit molecular, 
remains empiricism. 

This chronic theoretical fragility is the consequence of the singular place occu- 
pied by biology in the natural sciences, somewhere “between law and history” 
(Gayon 1993). This leads to the impossibility of any integrating generalization, 
since the inferences of biology will always be formulated subject to a phylogenetic 
inventory. 
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Chapter 5 ®) 
Experimentation in Psychology erst 


Jean Audusseau 


Attempts to define scientific psychology by its object have regularly provoked lively 
controversies throughout the history of the discipline. Terms such as consciousness, 
behavior, conduct, or mental processes hide major theoretical quandaries that are 
difficult to justify today as a result of the youth of this field of research. Nonetheless, 
a consensus seems to have been reached early in the history of the discipline 
concerning the method of scientific psychology, described as “experimental”. 
Indeed, in the absence of a shared object, experimentation has served as 
psychology’s main source of identity since its origins in the middle of the nineteenth 
century: “experimental” psychology represents “the body of knowledge acquired in 
psychology through the use of the experimental method” (Fraisse et al. 1963; 
Postman and Egan 1949). The term “experimental psychology” seems to have fallen 
into disuse mainly as a result of its success. Experimentation has imposed itself in 
most of the sub-disciplines of psychology as the indisputable means of establishing 
the validity of psychological knowledge. It is regularly considered the principal 
demarcation criterion of a scientific psychology. 

As we will see this chapter, this methodological consensus seems somewhat 
fragile. Beginning with Claude Bernard’s definition, to which the first experimental 
psychologists consistently referred, the experimental method has been refined and 
has undergone fundamental adjustments. We will try to show that these adjustments 
were necessary because of the relative inadequacy of the experimental method for 
testing the theoretical hypotheses of contemporary psychologists. 
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5.1 The First Uses of Experimentation 


In his famous Introduction to the Study of Experimental Medicine, the physician and 
physiologist Claude Bernard presented the principles of a method of experimenta- 
tion, applied to the life sciences, whose legacy we shall henceforth describe as 
prototypical. In this section, we show how experimental psychology has been 
constructed in reference to these principles. 

Most of the first experimental psychologists of the middle of the nineteenth 
century were physiologists by training (e.g. Ernst Weber, Gustav Fechner, Hermann 
von Helmholtz). In keeping with the Comtean exclusion of knowledge of the human 
mind from the field of positive sciences, psychophysiologists proposed to account 
for the psychological phenomena of perception through the study of the physiology 
of the nervous system. They adhered to a philosophical tradition inherited from 
empiricism and anchored in the positivist context of the time. At the same time, they 
firmly rejected any form of metaphysical speculation. 

The prototypical experimental approach proposed by Bernard in the field of the 
natural sciences was the ideal means for them to establish their knowledge, insofar as 
Bernard defined experimentation as “a reasoning by means of which we methodi- 
cally submit our ideas to the experience of facts” (Bernard 1966, p. 7). This method 
had a dual function for the psychologists. First, it enabled them to claim their 
approach as truly scientific, which — especially at that time — represented an 
important source of validation and legitimized the project of institutionalizing their 
discipline. A second function of the use of experimentation, of greater interest for 
this chapter, was that it offered a robust heuristic framework for the analysis of 
psychophysical phenomena and subsequently — throughout the twentieth century 
and to the present day — for the analysis of psychological phenomena properly 
speaking. 


5.2 From Observation to Experimentation 


For both Bernard and experimental psychologists (Fraisse 1956), observation and 
experimentation are not opposed. They constitute two inseparable moments or 
modalities of the scientific approach: while it is possible to adopt an observational 
approach without resorting to experimentation, the reverse is not true. We shall 
therefore first describe the main characteristics of observation in psychology. 


5.2.1 Observation 


Observation plays a fundamental role in much of psychology. Consider the famous 
research done in child development psychology by Jean Piaget or Arnold Gesell. In 
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its simplest sense,’ observation consists of collecting a set of facts considered as 
manifestations of the phenomenon under study. Observation is never limited to the 
simple recording of facts; it selects facts because they inform the psychologist about 
the validity of a clearly stated hypothesis. As Bernard (1966, p. 77) also pointed out, 
“a fact is nothing by itself; it is only worthwhile in terms of the idea attached to it or 
the proof it provides”. But, conversely, the psychologist’s concept (e.g. attention 
directed towards others) is worthwhile only if it can be operationalized in the form of 
an empirical fact (e.g., in the case of an infant, an ocular exploration of a determined 
duration of its mother’s face, measured with the help of an oculometer). This concern 
for operationalization — which leads to the specification of the conditions in which 
the observation is carried out — is central to maximizing agreement between 
observers, a necessary condition for the establishment of objective knowledge. It 
also makes it possible to limit — if not eliminate — the biases inherent in the observer’s 
subjectivity, such as his or her personal prejudices and beliefs. 

Among the many other biases that affect the quality of the observation, the mere 
presence of the observer can obviously modify the behavior of the human subject. In 
some studies, attempts are made to eliminate this bias by using a one-way mirror or a 
camera. In most cases, however, the presence of the observer is necessary to 
establish a climate of trust with the participant. Then, rather than a “bias” to be 
eliminated, the role played by the observer can be considered as data which, like the 
other variables having an impact on the phenomenon, must be analyzed. 

Finally, although observation should not be confused with the measurement of 
the fact, the use of measuring instruments plays a central role in observation since it 
also contributes to the search for objectivity. In psychology, these instruments can be 
very simple (paper and pencil, stopwatch) or more complex (audio and/or video 
recording, psychometric test, survey, apparatus for measuring muscular or cerebral 
activity, etc.), and enable the quantification of observables. The fidelity and validity 
of these instruments are generally subject to rigorous psychometric evaluation 
(Dickes et al. 1994). 


5.2.2 The Need to Move to Experimentation 


While observation offers the possibility of validating an hypothesis, “validation” 
here has a weaker meaning than in experimentation. Since it is not subject to the 
Bernardian principle of experimental manipulation (see below), many authors 
believe that observation does not allow the causes of the phenomenon to be 
identified in a conclusive manner. For this reason, some experimental psychologists 
disparage observation alone. According to Benton J. Underwood (1966), observa- 
tion suggests, at best, hypotheses to a science that is still young and await verification 
by experimentation. At worst, observation leads to false conclusions. 


'For more detail, see Reuchlin (2002). 
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Other experimenters, such as Fraisse, take a more nuanced view, recommending 
the development of experimentation while recognizing that “psychology cannot be 
entirely constituted as an experimental science in the strict sense of the term and, to 
some extent, it will remain a science of observation” (Fraisse 1956, p. 32). Referring 
to Bernard, Fraisse argues that observation and experimentation are based on a 
relatively similar approach that always proceeds by comparing and relating facts. 
As will be explained below, the essential difference is that, in experimentation, at 
least one of these facts is not simply invoked (i.e. observed in nature), but provoked 
by an intentional intervention. 

Experimentation in psychology is based on a deterministic conception of science, 
widely shared within the natural sciences, according to which the phenomena 
studied must be explained by their causes. One recognizes here the heritage of 
mechanistic thought, characteristic of classical physics, and taken up by Bernard 
(1966, p. 95): “The absolute determinism of the phenomena of which we are aware a 
priori is the only criterion or the only principle which directs us and supports us.” 

Central to our discussion is the fact that the deterministic view can be tested 
experimentally only if it is assumed that the number of causes involved is limited, 
otherwise any investigation seems doomed to failure (Underwood 1966). Thus, 
many experimental psychologists set themselves the task of reducing psychological 
phenomena to a small number of independent determinants. 


5.3 Silencing the Noise to Hear the Cause 


Experimental psychologists devote considerable energy to neutralizing a wide range 
of variations that they call “noise”, “error”, or “bias”. While we must distinguish 
between systematic error (e.g., bias that has a similar impact on all participants in the 
experiment) and random error (e.g., an inaccuracy in measurement), the idea is to 
eliminate influences considered undesirable because they mask the determinant 
under investigation (Woodworth 1938) and interfere with the reproducibility of the 
experiment (a major condition for the validity of the hypothesis being tested). This 
principle, known as “experimental control”, can be broken down into two strategies, 
depending on whether the control relates to the conditions of the experiment or to the 
differences between the individuals taking part in the experiment. 

We have already highlighted the way in which experimental psychology labs 
provide an environment where conditions can be rigidly specified. A few examples 
will complete our discussion. In studies of sensory perception, one can standardize 
the lighting of the room to an equivalent intensity for all participants, standardize the 
background on which the stimuli appear, or use soundproof rooms. In language 
experiments, meaningless words are sometimes used to counteract the effects of 
familiarity or prior knowledge of the meaning of the word. Finally, when the 
experiments include several tests, the order of these tests is randomly varied 
according to the participants (counterbalancing), in order to avoid a possible order 
effect. Controlling for differences between individuals is a thorny issue for 
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experimental psychology, the limitations of which we will discuss below. These 
differences were considered negligible by nineteenth century psychophysicists who 
therefore did not hesitate to establish experimental laws based on a single 
participant. 

During the twentieth century, this neglect of individual differences gradually 
came to be untenable and the main strategy for neutralising these differences 
consisted in using large samples of individuals (Postman and Egan 1949). The 
massive introduction of the analysis of variance proposed by Ronald Fisher in 
1925 marked a major turning point in experimental psychology. It enabled the 
discipline to isolate genuine universal laws within the variability of the data obtained 
(e.g. the effect of belonging to a certain experimental condition on the responses of 
individuals), and to identify a measurement error consisting of differences between 
individuals. When the sample is large enough, it is assumed, according to the law of 
large numbers, that these individual differences compensate each other around a 
central tendency. Another strategy, widely used in psychology when the experiment 
consists of comparing several experimental conditions, is to ensure that the distri- 
bution of individuals in these conditions guarantees the “equivalence” of these 
conditions. This means that the individual variables that “bias” the experiment 
(e.g. age, gender, intellectual level, socio-economic category, etc.) must have 
means (and sometimes variances) that do not differ significantly between the differ- 
ent conditions. 

This control of potential influences requires tedious work when the number of 
these influences is significant, which, given the immense complexity of psycholog- 
ical phenomena, is the rule rather than the exception. Costly in terms of time and 
energy, this requirement is sometimes considered, even within the experimentalist 
movement, as too far removed from the work of theoretical interpretation and/or as 
leading to look at conditions that are too artificial. 


5.4 Manipulation as a Form of Proof 


The principle of experimental manipulation in psychology can be linked to the 
notion of counterevidence that is based on a counterfactual conception of causality 
(Lewis 1973). It should be emphasized that this principle is valid only if the isolation 
of the cause from other “disturbing” influences is guaranteed (see the previous 
section). An independent variable can be said to have causal status — by showing 
that its removal cancels out the effect on a dependent variable — only if all other 
potential sources of influence have been neutralized (e.g., all other things being 
equal). Prototypically, experimentation in psychology consists in comparing the 
behaviour of individuals placed in two experimental conditions, which differ only 
in terms of the role — suppressed or not — of an independent variable having the 
hypothetical status of the cause of these behaviours. When the operational hypoth- 
esis of no difference between these two conditions can be rejected, the psychologist 
concludes that the alternative hypothesis of causality is corroborated. 
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Again, with reference to Claude Bernard, the independent variable is “provoked” 
when it consists of an active and voluntary intervention by the experimenter (i.e. a 
manipulation). Here are some examples. The manipulation may involve the envi- 
ronmental conditions. Within the behaviorist stream (but not restricted to this 
stream), the presence (vs. absence) of a physical stimulus presented to participants 
is classically manipulated. In some social psychology experiments, researchers 
manipulate characteristics of the participant’s social environment (e.g., presence or 
absence of an authority figure). This manipulation can also involve the type of 
activity that is presented to participants, as in cognitive psychology research where 
the mobilization of a given information process is manipulated in a targeted manner. 
For example, in an experimental condition known as “dual-task”, participants are 
asked to perform a primary task (e.g., driving a car using a simulator) while 
performing a secondary task that requires the subject’s attention (e.g., a phone 
call, a task that requires the storage of verbal information in short-term memory). 
In a second iteration, only the main task is performed. If subjects’ performance on 
the primary task (e.g., a score assessing driving quality) is weaker in the first instance 
than in the second, the presence of the secondary task is considered to cause an 
interference that hinders the implementation of the attentional processes required in 
the primary task. We can thus conclude that the main task also mobilises the 
subject’s attentional resources (the driving activity mobilises the subject’s attention 
and decreases in efficiency when an additional attentional demand is added). If, on 
the other hand, performance is equivalent on average in the two conditions, we 
consider that the main task does not mobilize the subject’s attentional processes 
(driving is automated in this situation and does not suffer from an additional 
attentional demand). 

Note that while the provoked variable is often, as in the previous examples, a 
qualitative variable (e.g., presence vs. absence), it can also be quantitative. Manip- 
ulation does not then refer — as with Bernardian counter-proof — to the suppression of 
the hypothetical cause, but rather to its continuous variation. For example, the 
experiments of the first psychophysicists most often consisted in varying the inten- 
sity of a physical stimulus. Weber’s famous law (Fechner 1948), according to which 
sensation varies according to the logarithm of the intensity of the stimulus, is a good 
example. 

One of the main limitations of the experimental manipulation prototype is that 
many variables are not manipulable in psychology. For example, it is not possible to 
vary — let alone “suppress” — an individual’s age, gender, or socio-cultural back- 
ground, even though such variables often play a major causal role in the determina- 
tion of behavior. The main experimental strategy is then to relax the prototype by 
considering these variables as invoked, which brings us closer to an observational 
approach as presented above. In a way, it is nature that takes charge of “manipulat- 
ing” the hypothetical cause, the role of the experimenter being essentially limited to 
isolating the cause. Although we saw above that some experimental psychologists 
tend to reject this less prototypical approach, we note here that it is sometimes — 
indeed often — the only one that can be pursued. 
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Too close an attachment to observation provoked by the experimenter would also 
deprive the psychologist of the study of many psychopathological phenomena. For 
obvious ethical reasons, it is not possible to induce mental illness or brain damage in 
humans (hence the frequent use of animal models in neuroscience). As Daniel 
Lagache, one of the main representatives of clinical psychology in France, observed, 
“the psychology of jealousy in love, of crimes of passion, of suicide has little to 
expect from experimentation” (Lagache 1949, p. 13-14). 


5.5 Limitations and Adjustments of the Experimental 
Method 


Any empirical science must justify its theories by a credible process of proof, and we 
have seen above that the prototype of experimentation offers undeniable advantages 
in psychology from this point of view. We would like, however, to emphasize in this 
section how the limits of this prototype, when confronted with the particularities of 
psychological phenomena, drive a necessary evolution. 


5.5.1 The Test of an Indivisible Totality 


A long theoretical tradition of psychology, which can be described as “holistic” 
(or “molar’), can be contrasted with the “elementalist” (e.g., “analytic” or “molec- 
ular’’) approach characteristic of experimentation. Based on Reuchlin’s (1995) work, 
we distinguish three main currents within this holistic tradition. 

Humanistic psychologists, strongly influenced by phenomenology and often 
attached to the field of clinical psychology, consider that lived experience necessar- 
ily takes the form of an indivisible whole. To isolate selected elements experimen- 
tally would, on this view, make no sense, nor would it be possible to reconstitute the 
lived experience by synthesis, from the various feelings, ideas, or perceptions 
previously isolated. Following Wilhelm Dilthey (1894), these psychologists are 
generally wary of attempts to explain mental life based on its determinants, and 
prefer a comprehensive approach, based on the history of the patient as s/he 
subjectively perceives it. 

While calling for a union of the naturalistic and humanistic traditions of psychol- 
ogy, the major school of form or “Gestalt” psychology (Lewin 1936) also subscribes 
to a holistic conception. In the field of perception, Gestalt psychologists condemn the 
elementalism of a “content psychology” (as opposed to a container psychology), that 
attempts to dissociate the perceived “microscopic” units. The gestalt tradition, often 
criticized despite its continued development (Wagemans et al. 2012), considers that 
perceptual experience takes place according to global forms that can be apprehended 
only at the “macroscopic” level. The functionalist current in psychology is also 
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characterized by a distrust of elementalism, although in a less clear-cut manner. 
William James (1891), after extensive training in experimental physiology (but also 
in philosophy), insisted that the successive states of consciousness presented a 
temporal continuity, which could not be broken down experimentally without losing 
the inherent coherence of living organisms. Strongly inspired by Darwinian thought, 
James believed that the solidarity and continuity of the states of consciousness were 
fundamentally explained by the fact that these states ensured an adaptive function for 
the organism (see also the work of John Dewey 1896). The components of this 
system, if taken in isolation, would make it impossible to account for this adaptive 
function. This functionalist conception, inspired by biology, has had a major impact 
on current theories in cognitive psychology and neuroscience. 


5.5.1.1 Psychic Functioning as a Probabilistic Dynamic System 


The evolution of psychology during the twentieth century was characterized by a 
major epistemological break constituted by a leap into complexity. Inspired by 
developments in physics (Prigogine and Stengers 1986), the cognitive psychology 
that emerged in the 1950s evolved towards a representation of human functioning as 
a dynamic system of interacting processes (Thelen and Smith 2006; van Geert 2011). 
For several reasons, the prototype of experimentation was maladapted to this 
theoretical framework. 

First, the multivariate nature of this system comes to the foreground: an individ- 
ual’s conduct — undecomposable in the prototypical sense — constitutes an emer- 
gence of this system of interacting processes. Influences within the system are not 
unidirectional causal, characteristic of the experimental prototype; rather, they are 
conceived as bidirectional influences, in the form of feedback loops that give the 
system its self-organized functioning and temporal dynamics. 

Second, these models are probabilistic, and grant a different epistemological 
status to chance and variability (Lautrey 2003). Rather than a set of biases to be 
controlled experimentally, certain forms of variability have a strong heuristic poten- 
tial. They manifest the changes that the organism undergoes in its interactions with 
its environment. Contemporary scientific psychology no longer seeks to go beyond a 
conjectural stage — marked by unpredictability — in order to achieve a strictly 
deterministic forecast — as Clark L. Hull (1943) or Fraisse (1956) thought. Rather 
it endorses Egon Brunswik’s probabilistic functionalism (1952), according to which 
psychology must renounce the isolation of causes and focus more on the encounter — 
highly probabilistic — between the characteristics of the organism and the situation. 
The psychologists inspired by this new conception also tend to abandon the exper- 
imental context of the laboratory, since this aims to neutralize sources of variability 
which they consider to be quite informative. They prefer instead the context of the 
usual conditions of life. This concern for ecological validity leads them to a more 
moderate use of experimental manipulation, which, in their minds, results in an 
artificialization of the phenomenon. 
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5.5.1.2 Methodological Implications for Data Modeling 


In order to connect their theoretical discourse to foundational empirical data, psy- 
chologists have developed the use of statistical modelling. Depending on their 
degree of attachment to the prototype of experimentation, this use takes diverse 
forms. 

With the invention of the analysis of variance by Ronald Fisher (1925), experi- 
mental psychologists discovered a method that was entirely consistent with the 
prototype of experimentation, and which is still widely used in contemporary 
research. Despite subsequent developments of this method during the twentieth 
century (i.e. the possibility of introducing several independent variables and then 
testing the hypothesis of an interaction between these variables), it remains quite 
limited when it comes to testing hypotheses concerning the systemic nature of the 
phenomenon. 

Another statistical tradition, older and closely linked to the history of psychology, 
is more consistent with a conceptualization in terms of a complex system: the 
correlational approach, which makes it possible to analyze associations within a 
system of variables. This method, which Emile Durkheim used in his famous work 
on suicide (Durkheim 1897) under the heading “concomitant variations”, can be 
considered as a form of “experimental reasoning” — in the sense in which Bernard 
used this term — but it differs from experimentation in the prototypical sense 
presented above. Without going into the details of these models, note that their 
multivariate nature makes it possible to analyze simultaneously, without dissociating 
them, the relationships — recursive or not — that exist within a system of variables. In 
order to account for the dynamics of the system, these variables can be measured on 
multiple occasions in the individual and analyzed in the framework of structural 
equation models for longitudinal data. These variables can eventually be provoked, 
but the most frequent use in psychology concerns variables invoked by observation. 
Therefore, the correlational approach is regularly criticized by the defenders of the 
experimental prototype, as was the observational approach: the concomitant varia- 
tion of two invoked variables does not as such offer proof of their causal relationship. 
Karl Pearson responded to this criticism by simply rejecting the notion of causality in 
favour of association: “How often do we hear, when a new phenomenon is discov- 
ered, the question: what is its cause? This is an unanswerable question, whereas the 
question: to what degree are other phenomena associated with it? can easily be 
answered and lead to valuable knowledge” (Pearson 1892, cited by Reuchlin 2002). 


5.5.2 The Test of Individual Differences 


The use of experimentation in psychology is strongly associated with the search for 
universal general laws, to the detriment of accounting for inter-individual differ- 
ences. The neutralization of these differences by various experimental control 
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procedures (see above) amounts to assimilating individuals to interchangeable 
entities, which constitute more or less biased realizations of an ideal standard 
(operationalized by a statistical average). For Bernard, scientific truth was to be 
found in the legality of this norm, and individuality thus constituted “one of the most 
considerable obstacles of biology and experimental medicine”. To which Georges 
Canguilhem answered, in a now famous critique, that Bernard’s position was “a 
rather naive way of ignoring that the obstacle to science and the object of science are 
one and the same” (Canguilhem 1952, p. 203). 

The aggregation of differences was, however, also criticized by Bernard himself. 
In an often-quoted passage, he humorously mocked a physiologist who collected a 
urine sample from the toilets of a large European railway station, and thus thought he 
could reconstruct “average European urine”. This average, which neutralized the 
important inter-individual differences in the composition of urine, was a description 
of something that is never found in nature and was therefore of little interest to 
Bernard. And yet, Bernard’s experimental approach was itself oriented towards the 
identification of general laws, to the detriment of considering the variability inherent 
in living beings. As Canguilhem (1952) noted, Bernard seemed to admit his own 
inability to describe reality when he first defended the existence of an idealized 
norm, the only source of identification of a scientific law, and then recognized that 
this norm had no reality. 

Experimental psychologists, who are very attached to the Bernardian principle of 
the law-like nature of phenomena, frequently aggregate human individuals while 
simultaneously invoking the principle of isolation of a single determinant. Now, all 
else being equal, and, since throughout the behaviourist period only the situation in 
which the individual finds himself plays this role of determinant, then individual 
variations must remain silent. The prototypical experiment in psychology consists in 
observing a mean difference between two experimental conditions, without consid- 
ering the possibility — which is often the rule — that this difference may take on very 
different, even opposite, values depending on the individuals. 

The questioning of this principle, increasingly common in contemporary work, 
can be explained by the major influence of a tradition which has long remained 
outside of experimental psychology: differential psychology. Differential psycholo- 
gists seek to account for the structural relationships that exist within a large set of 
interrelated variables; each of these variables makes it possible to differentiate a 
sample of individuals in terms of a particular psychological characteristic. To 
achieve this, differential psychologists invented and then developed correlational 
methods, and it is not surprising that this sub-discipline was a pioneer in the 
recognition of the systemic (or “structural”) character of human functioning. 

Experimental psychology and differential psychology, which had long opposed 
each other, recognized in the middle of the twentieth century that they could 
overcome their respective limitations by acknowledging their complementarity. 
Experimental psychology should not limit itself to studying the effect of variations 
in situations while ignoring individual differences and, conversely, differential 
psychology should not study individual differences without considering the role of 
the situations in which individuals were placed. Lee Cronbach (1957) thus called for 
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the development of a unified science, which takes as its object individual conduct 
conceived as the product of an interaction between individual and situational 
characteristics (see also Fraisse 1956). It would be an exaggeration to suggest that 
the birth of cognitive psychology at this time was a response to this call for 
unification, insofar as the processes extracted from the “black box” of the behavior- 
ists were mainly seen as general processes subject to the effect of situational vari- 
ables (i.e. in the experimental tradition). Nevertheless, while maintaining the 
hypothesis of a central role for environmental stimuli in the mobilization of these 
processes, many contemporary cognitive psychologists jointly study the individual 
variations for which the mobilization of these processes is the object. This attempt at 
unification is manifested by a significant relaxation of the experimental method, via 
the joint study of independent individual and situational variables (invoked and/or 
provoked). 


5.5.3. The Test of Singularity 


We have seen that the use of correlational methods allows psychologists to account 
for the structure of individual differences. It should nevertheless be noted that, in this 
approach, the behaviour of an individual is always evaluated in relation to that of the 
other individuals in the sample. Correlation (in the Pearsonian sense), which eval- 
uates the way in which two variables are characterized by a stability of individual 
deviations from the sample mean, is fundamentally based on a comparison of each 
individual with a population norm. Prototypical experimentation neglects the singu- 
larity of the individual’s functioning, and a similar criticism can be levelled at the 
differential approach, which focuses on the study of differences between individuals. 
This approach postulates that laws identified on the basis of a comparison between 
individuals can be transposed to the intra-individual level of analysis (i.e. to the level 
of individual and his/her variations over time and according to the situations s/he 
encounters). It seems, however, that the structure of inter-individual differences is not 
isomorphic to that of intra-individual differences (the principle of non-ergodicity) 
(Molenaar 2004). The correlational method used at the inter-individual level can thus 
lead to erroneous conclusions when it is heedlessly generalized to the intra-individual 
level. 

Intra-individual level variations (which are reduced to the status of noise in 
prototypical experimentation and in the traditional differential approach) appear, 
however, increasingly essential to contemporary psychologists. Indeed, the more 
complex the phenomenon, the more likely it is that the interference of potential 
causes is likely to be expressed in a singular way in everyone. Rather than neglecting 
this singularity, or trying to neutralize it, some psychologists try to refocus their level 
of analysis on the singular case without abandoning the scientific character of their 
approach. This so-called “idiographic” approach (Lamiell 1981) — initially proposed 
by Gordon Allport (1937) in opposition to the traditional “nomothetic” approach — 
entails profound methodological modifications, which can nevertheless be linked to 
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the prototypical experimental method, as well as to the correlational method. The 
common principle underlying this research is to substitute the multiplicity of mea- 
sures (carried out in a single case) for the multiplicity of cases (encountered through 
a single measurement). 

The move from prototypical experimentation to single-case level give rise to 
studies termed “single-case experimental design” (Kazdin 2011). The principle, 
which is not new (Skinner 1966), consists in measuring the dependent variable in 
a single subject on several occasions, while varying the level of an independent 
variable. A common example in the clinical field is to assess a given psychological 
phenomenon (e.g., the occurrence of a symptom, or the efficiency of a process) 
during an initial phase, and then to experimentally introduce a change in the patient 
(e.g., the initiation of a psychotherapeutic treatment) during a second phase. Various 
statistical methods (e.g., a randomization test) can then be used to test the impact of 
this experimental manipulation on the phenomenon under study. By administering 
the counter-test in this way, one of the major pitfalls of group studies — such as 
randomized control trials — is avoided, which is to conclude that a treatment is 
effective on average ignoring the fact that the treatment may be ineffective, or even 
harmful, in certain specific cases. 

The correlational approach has also been adapted to the idiographic approach, 
based on the original proposals of Raymond Cattell et al. (1947) (P-plane factor 
analysis). Recent methodological developments in time series analysis and network 
analysis (Bringmann et al. 2016) have made it possible, for example, to account for 
the dynamics of the subject’s functioning based on the systemic relationships 
between variables (see the previous section on the notion of a probabilistic dynamic 
system), using a set of variables measured repeatedly in a single subject. In this type 
of analysis, we can distinguish the effect of a variable on itself over time (autocor- 
relation allows for a trajectory analysis), but also the effect of a variable on another 
variable in the system in a synchronous manner (contemporary effects) or diachron- 
ically (effects lagged in time). Moreover, this method can lead to the establishment 
of a causal relationship by relying on the general idea that a cause necessarily 
precedes its effect. It is possible under certain conditions to refute a causal hypoth- 
esis when the temporal changes of the independent variable do not precede those of 
the dependent variable. 

Finally, whether experimental or correlational, the idiographic approach makes it 
possible to avoid a frequent error characteristic of the nomothetic approach which 
consists in identifying a law at the level of a sample, and then mistakenly considering 
that this law applies indifferently to each of the individuals in the sample. One may, 
however, doubt the scientificity of an approach limited to the level of the individual 
case. According to a well-known principle, all scientific knowledge must have a 
general scope. Though the analysis of a single case may seem rightly justified in the 
eyes of the psychologist, a hypothesis verified at this level alone cannot constitute a 
psychological law. In order to avoid this double pitfall (erroneous generalization or 
absence of any generalization), it is possible, after having first analyzed the func- 
tioning of several individuals in an idiographic approach (i.e. by considering them in 
isolation), to proceed to an informed grouping of these individuals (Juhel 2018). In 
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other words, it is a matter of creating assemblages of individuals on the basis that, 
within a given group, the same psychological model accounts for individual data 
patterns. This avoids the pitfall of overgeneralization by considering that different 
individuals may have different psychological models and avoids the absence of any 
generalization by clustering individuals who share a common psychological profile. 

This fundamental question of the level(s) of analysis (inter-group, inter-individ- 
ual, intra-individual) that allows psychology to achieve a satisfactory generalizabil- 
ity of its propositions is particularly crucial at a time when the scientific validity of 
the discipline is regularly debated by assessing its capacity to replicate its results 
(cf. the replication crisis in psychology — Stanley et al. 2018). This complex 
problem, which concerns each of the levels of analysis we have distinguished, is 
beyond the scope of this chapter. We simply note that it is possible to replicate a 
result on another sample (or in another individual) only if this result is general in 
nature. The failure of a replication attempt might certainly be due to the illusory or 
false nature of the result, but it can also be linked to the fact that the new sample 
(or the new individual) does not share the psychological functioning of the first. 
Taking up Wilhelm Windelband’s ideas on the distinction between nomothetic and 
idiographic knowledge (Lamiell 1998), it seems to us that the psychologist’s concern 
to generate general scientific knowledge (and therefore potentially replicable) cannot 
be satisfied with constant recourse to aggregation. Psychological knowledge will be 
considered “general” when it manages to create law-like descriptions of the diver- 
sity, complexity and dynamics inherent in human behaviour. 


5.6 Conclusion 


It would not be fair to conclude this presentation by saying that the prototype of 
experimentation, inherited from Bernard’s thinking, belongs to a bygone era in 
psychology. Many current publications faithfully reproduce the principles of this 
prototype, to which psychology students are introduced from the beginning of their 
studies. The limitations we have outlined, however, are recognized by a growing 
number of psychologists. Important methodological advances in recent years offer 
ways to overcome these limitations, and clearly shift the demands of researchers 
toward the accommodations we have presented. In particular, the principle of 
reduction to the most elementary determinants possible is now widely decried. 
While the principle of experimental control remains, the proportion of variations 
traditionally interpreted as noise to be neutralized has tended to decrease signifi- 
cantly, which allows for the integration of these variations within the complex 
system of variables in operation. This extension of the system also allows scientific 
psychology to go beyond the laboratory context in order to contribute to the 
explanation of the behaviors of subjects in their ordinary conditions of life. 
Finally, the reluctance to adapt the prototype of experimentation flows mainly 
from the tenacious attachment to the principle of experimental manipulation — via the 
use of provoked variables — in order to infer causal relations. The use of invoked 
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observational variables, which is frequent in the systems approach, continues to be 
confronted with its apparent incompatibility with a causal inference approach. It 
should be noted, however, that the principle of manipulation, however useful it may 
be, is not a necessary condition for such causal inferences (astrophysicists demon- 
strate, for example, the existence of causal relations without having recourse to this 
principle). While the demonstration of a statistical relationship between two vari- 
ables does not constitute a demonstration of their causal relationship, the absence of 
such a Statistical relationship can provide a convincing argument to refute a causal 
hypothesis. Recent developments in correlational methods also offer perspectives 
that prove compatible, under certain conditions, with the causal inference approach 
(Juhel 2015). Finally, and most centrally, the observation of a statistical relationship 
can sometimes be legitimately interpreted causally when it corroborates a theoretical 
causal hypothesis specified a priori. Bernard himself affirmed that “it is not the case 
[...] that the hand of the experimenter must always intervene actively to bring about 
the appearance of phenomena”, and he added that the scientist’s observations can be 
“provoked by a preconceived idea about the cause of the disturbance” (Bernard 
1966). Once again, we see that the somewhat rigid prototype that we have drawn 
from the valuable heritage of Bernard’s thought should not make us forget the 
complexity and flexibility of his thought. 
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Chapter 6 ®) 
Experimentation in Sociology oe 


Dominique Raynaud 


This chapter examines the following questions: Can we practice experimentation in 
sociology as in other sciences? Does sociology present specific difficulties? If so, 
what are they? While these questions appear well circumscribed and rather limited in 
scope, the answers offered here shed significant light on certain fundamental ques- 
tions, such as the unity and diversity of the sciences, and the epistemological status 
of sociology. 

Within the sciences, sociology is without question one of the disciplines that 
occasions the most contradictory judgments with regards to its status as a science. 
Some say that it is an ordinary science: “It is not difficult to find theories as solid in 
the social sciences as in the natural sciences” (Boudon 1995, p. 377), while others 
deploy significant ingenuity to show that its scientific status relies on special 
standards of scientificity. Dedicated to the “interpretative reconstruction of reality”, 
it belongs “to another form of the scientific mind than that exemplified by the natural 
sciences” (Passeron 1991, p. 27, p. 32). As we shall see, the study of forms of 
experimentation in sociology sheds light on these contradictory viewpoints. 

Before reviewing the work in experimental sociology — from sham experiments to 
randomized controlled experiments, using a selection that reflects the diversity of 
approaches in the field — some history is necessary. Durkheim is often regarded as 
the originator of the experimental method in sociology (Berthelot 1995). This 
opinion is only partially true. Durkheim’s position can be understood only if we 
first distinguish between experimentation (the act of manipulating variables to 
measure their effects) and experimental reasoning (reasoning about this variation). 
Despite the experimentalist position sociologists often attribute to the Rules of 
Sociological Method, Durkheim opposed direct experimentation. Arguing that “all 
artificial experiments are impossible” (Durkheim 1895, p. 159), he felt that 
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sociology should employ indirect experimentation, i.e. the “method of concomitant 
variation”. Anti-experimentalist arguments in sociology have been defended that go 
far beyond Durkheim’s intermediate position. These arguments manifest extreme 
generality and a lack of contact with experimental research (whereas all experimen- 
tation can, and must, be the object of a detailed critique). Although the anti- 
experimental arguments were admissible in Comte’s or Durkheim’s time, when 
experimental sociology was in its infancy, this is no longer the case, since a vast 
literature has accrued in this regard.' 


6.1 Forms of Experimentation in Sociology 


I propose here to review the main experimental forms encountered in sociology. 
Closer to analytical epistemology than to normative epistemology, this review does 
not aim to rank the experimental forms, but to take note of their variety, by pointing 
out their limits and possible weaknesses. The articulation of sociology with the 
general methodology of experimentation sometimes involves borrowing the techni- 
cal vocabulary common to all experimental studies: control variables, control group, 
experimental group, etc., and sometimes the importation of modalities linked to the 
degree of validity of the results obtained by applying the experimental method: 
quasi-experiment, controlled experiment, controlled experiment with random 
assignment. 


6.1.1 Social Experiments 


In the wake of the “action-research” of the 1970s, some sociologists claimed to be 
practicing social experimentation — an expression which corresponds to the Amer- 
ican “field experiment”. These studies form a composite whole, covering almost the 
entire spectrum running from experiments without control to regular quasi- 
experiments. The general framework of these studies is defined as follows: “Social 
experimentation involves the implementation of innovative social arrangements, 
followed by an evaluation of public policies concluding in success or failure” 
(Gurgand 2014). Experimentation here means that a novel device is implemented 
to solve a practical problem, with frequent invocation of the right to make mistakes. 
The methods for evaluating the experiment were not specific. These “social exper- 
iments” were therefore distinguished from classical experiments by the substitution 
of a utility aim for a knowledge aim; optional recourse to the experimental method 
advocating the control of variables, randomization and double blinding. These 
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distortions, which were frequent but not systematic, justify the sometimes-harsh 
judgments found in the literature: “In France, all evaluations of employment and 
training schemes are non-experimental” (Perez 2000, pp. 145-163). 

In the mid-1990s, the desertion of rural areas reached a peak in the French 
departments composing the “diagonal of emptiness” (Creuse, Cantal, Lozére, 
Gers, Ariége). In 1994, a social experiment was launched in communities with 
fewer than 2000 inhabitants to make life easier for those who had not chosen rural 
exodus. Among the solutions envisaged were multi-service points, which brought 
together the different services that had disappeared one after the other over the years: 
grocery store, post office, pharmacy, library, internet access, etc. A few years later, 
the system was evaluated and extended to other municipalities. The evaluation, 
however, was based only on testimonials and common-sense considerations. Unless 
Iam mistaken, there is no record of a scientific experiment that was carried out at that 
time on multi-service points. The term “experiment” here is synonymous with 
“implementation of a novel practical solution” and is foreign to the usual consider- 
ations concerning a scientific experiment. Social experiments therefore include 
actions that are more or less consistent with the experimental method. Even in the 
most favorable cases, however, they can be distinguished from scientific experi- 
ments properly speaking, because the hypotheses to be tested are not theoretical but 
practical (social experiments always substitute a utilitarian purpose for the aim of 
gaining knowledge). 


6.1.2 Opinion Polls: Experimentation or Observation? 


The name of experimentation given to some studies is not in itself a guarantee that 
they meet the canon of the experimental method. Without denying their social 
usefulness, or even their intellectual value if one is interested in the variety of 
methods that can be used, opinion polls — typically election polls — are not 
experiments. 

Opinion polls have been called experiments in the wake of the work of Jean 
Stoetzel, who conducted the first opinion poll in France in August of 1938. Stoetzel 
created Ifop (Institut francais d’opinion publique) the following November and 
developed the methodological aspects of polling in his secondary doctoral thesis, 
L’Etude expérimentale des opinions (Stoetzel 1943). Proceeding by extrapolation 
of a result from the part to the whole from which it is extracted, the opinion poll is 
sensitive to the homogeneity of the population and the representativeness of the 
chosen sample. A survey is said to be empirical if the individuals are chosen to 
express the same characteristics as the population on the main socio-demographic 
variables (quota method). A sample is probabilistic if the individuals are drawn at 
random, which makes sense only if the survey involves large numbers. Prima facie, 
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one might be tempted to justify the experimental nature of opinion polling by the fact 
that the method of matching the sample to the population is the same as that used to 
match the experimental group to the control group in a controlled experiment. We 
could also invoke the idea that polls meet the Bernardian criterion of modification: 
the pollster “modifies” public opinion to “make it appear in circumstances in which 
nature did not present it to him”: the fact of asking individuals a question can change 
their perception of a subject and their behaviour in response to it. It is difficult to 
discern the features of scientific experimentation here. Even if the opinion poll 
resembles an experiment in certain respects, it differs in its aim: the poll does not 
aim to produce new facts, but to gather an accurate representation of public opinion, 
an approach that is closer to observation than to experimentation. It is a question here 
of learning, for example, whether the popularity of a president in power is rising or 
falling; or of having an indicator to understand the degree to which the electorate is 
inclined towards this or that party at the time of an election. Unlike the experimental 
sociologist, who builds his protocol around a scientific question, the pollster does not 
test a scientific hypothesis. He seeks to obtain a representation of opinion: on the eve 
of the election, the pollster tries to predict who will be elected. Consequently, the 
difference between opinion polling and experimentation lies not so much in the 
techniques used, as in the goal pursued and the way the data are prepared and 
analyzed. 


6.1.3 Indirect Experimentation 


Indirect experimentation is a form of experimental reasoning that does not deal with 
facts artificially produced by the experimenter, but with information collected with 
regard to spontaneously occurring facts, and about which it is assumed that they are 
ideally mixed, or, alternatively, that the information collected is representative of the 
state of the world. 

Historically, the overrepresentation of this form of experimental reasoning in 
sociology can be explained by the successive positions assumed by the founders of 
sociology such as Comte, Durkheim and Mauss. Taking up Auguste Comte’s idea 
according to which “experimentation proper seems [...] to be entirely forbidden to 
the new science” (Comte 1864, p. 307- 48th Lesson), Emile Durkheim himself 
offered this judgment of its impossibility: 


The physical-chemical sciences and even the biological sciences are close enough [to the 
model of direct experimentation] that, in a large number of cases, the demonstration can be 
considered practically sufficient. But this is no longer the case in sociology because of the 
excessive complexity of the phenomena, combined with the impossibility of any artificial 
experiment (Durkheim 1895, p. 159). 


Durkheim sought an equivalent for the forbidden experimentation in sociology. 
Based on a difference between experimentation (concrete manipulation of variables) 
and experimental reasoning (reasoning about differences), Durkheim defended in 
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Chap. 5, “Rules for the Demonstration of Proof’,? of his Rules of Sociological 
Method, the idea that “indirect experimentation”, also called the “comparative 
method”, could be used as a method of proof in sociology. It should be noted that 
the experimental reasoning that applies to indirect experimentation is entirely com- 
patible with one of the main characteristics of the experimental method, which is the 
testing of predictions.* 

This position in favor of indirect experimentation was taken up again in full by 
Paul Fauconnet and Marcel Mauss in 1901: “Experimentation is not possible 
[in sociology]; one cannot voluntarily create typical social facts that one could 
then study. It is thus necessary to resort to the comparison of the various social 
facts of a same category in various societies, to try to release their essence. Basically, 
a well conducted comparison can give, in sociology, results equivalent to those of an 
experiment” (Mauss and Fauconnet 1901, p. 36). 

The craze produced by the “comparative method” in sociology as an avatar of the 
experimental method cannot be understood without mention of the concurring 
judgments of the discipline’s mentors. 

Durkheim, for example, advocated indirect experimentation: “When [...] the 
production of facts is not at our disposal and we can only approximate them as 
they have spontaneously occurred, the method used is that of indirect experimenta- 
tion” (Durkheim 1895, p. 153). 

Let us leave aside the fact that this recommendation is based on the erroneous 
belief in “the impossibility of any artificial experiment” (ibid., p. 159): at the very 
least, it encouraged sociologists to practice more indirect experimentation. Durk- 
heim himself offered a model for the practice in his study of suicide (Durkheim 
1897). We restrict ourselves to a well-defined case. Examining the statistics on 
suicide in Switzerland, we find that the suicide rate is four times higher in the 
north than in the south (from 86 to 326 suicides per million inhabitants). Durkheim 
wondered about the origin of this variation and discovered that the suicide rate was 
distributed according to religion. There were more suicides in the northern cantons, 
which were predominantly Protestant; fewer suicides in the southern cantons, which 
were predominantly Catholic. There was thus a “concomitant variation” in the 
suicide rate and religion. Studying the structure of Protestantism, Durkheim noted 
that nothing in the religion prepared people for suicide. Religion had only an indirect 
influence on behavior: Protestantism favored free will. Individuals were self- 
determining beings on all sorts of subjects, and so, too, with regards to life and 
death. How did Durkheim proceed? In contrast to the previous cases, an explanatory 
hypothesis was tested by inserting statistical tables to discover a covariation of two 
factors. This is what makes the comparative method an indirect experiment. It should 
be recalled that Durkheim did not have all the refinements of modern statistical 
analysis at his disposal and that the covariation of two variables does not always 
indicate a causal relationship. Lazarsfeld made this clear with the example of a 
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covariation between the number of births and the number of stork nests, which are 
obviously not related (Lazarsfeld 1966). When these difficulties are uncovered, they 
are usually handled by multivariate analysis. 


6.1.4 Quasi-Experimentation 


Since the 1960s, it has been customary to call “quasi-experimentation” a form of 
controlled experimentation that distinguishes the independent variable, whose effect 
is measured, from the control variables, which are fixed, but that do not involve the 
random assignment of individuals to the experimental and control groups. This is, if 
you like, the intermediate stage between indirect experimentation and controlled 
experimentation, which will be discussed later. 

The name of“ quasi-experimentation “ traditionally given to these studies is 
misleading because it suggests that quasi-experiments do not quite follow the 
experimental method. In fact, experimental reasoning remains intact. The prefix 
“quasi” refers to the fact that the experiment is not randomized (which diminishes 
its statistical validity but does not go to the heart of the experimental reasoning). The 
term was coined by Donald T. Campbell (Campbell and Stanley 1963; Campbell 
1968) in reference to the randomized experiments introduced by Ronald A. Fisher — 
termed pure experiments — with a view towards emphasizing the value of continuing 
to carry out non-randomized experiments of the type that William A. McCall had 
conducted in the early 1920s (before Fisher formulated his proposal for randomized 
experiments). It is in this context that we must situate the example of sociological 
experimentation that follows. F. Stuart Chapin (1888-1974), professor of sociology 
at the University of Minnesota, was the author of the manual Experimental Designs 
in Sociological Research (Chapin 1947), which long served as a reference guide for 
constructing a sociological experiment. Chapin is known to have conducted several 
direct experiments himself. I will limit myself to his classic study: “An Experiment 
on the Social Effects of Good Housing” published in the American Sociological 
Review (Chapin 1940). The study was conducted in partnership with two federal 
agencies that were part of Roosevelt’s New Deal policy: the Public Works Admin- 
istration Housing Division [PWA] and the United States Housing Authority 
[USHA]. Chapin’s study sought to identify the effects of relocating families living 
in a slum at Sumner Field in the northwest suburbs of Minneapolis, Minnesota. The 
relocated families were large enough to be divided into two groups of comparable 
size. Relocated families formed the experimental group; the non-relocated families 
formed the control group. The experiment initially planned to study 108 relocated 
families vs. 131 families who remained in the slum. The number of families was 
reduced to 44 vs. 38 families after matching the two groups. The effects of rehousing 
were studied at 12-month intervals. 

The results showed that the rehousing of the families living in shantytowns 
influenced the social conditions of the rehoused individuals: a reduction in promis- 
cuity, an increase in social status, and above all a resocialization of family members, 
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which led to better access to employment (an aspect not included in the study). 
Statistical analysis showed that for the experimental group “a one in 4,638 chance 
that the difference in social engagement was due to chance” (ibid. p. 874). 

The method was not perfect, however, and the small size of the two groups (44 vs. 
38 families) was only the most apparent of the problems posed by this experiment on 
Minneapolis slum families. When we look at how the two groups were created, we 
see that the families in the experimental group and the control group were matched 
after the fact on ten variables. This procedure is typical of controlled experiments 
without random assignment of families in the two groups, which justifies the name of 
quasi-experiment given to this type of study. A more important bias resulted from 
the fact that families in the experimental group were selected from among those who 
were candidates for occupancy of new housing, while families in the control group 
were selected from among those placed on a “waiting list’. Chapin explained that the 
families on this waiting list were “drawn from the group of applicants who had been 
thoroughly investigated by USHA officers but not immediately accepted as residents 
[of the new housing] because they were living in unhealthy but not strictly unsafe 
housing, because their incomes were not known with certainty, or because there 
were doubts about their economic or social stability” (ibid. p. 869). 

This calls for two comments. Firstly, the (non-random) assignment was thus not 
made by the experimenter but by USHA officials. In other words, the choice to place 
a family in the experimental or control group was not subject to the requirements of 
the experimental control but to the objectives of the USHA. Secondly, the homoge- 
neity of the control group was based, not on fixed values of the independent vari- 
ables, but on the uncertainty of the USHA agents about these values. This source of 
uncontrolled variation was a source of bias: it affected the comparison of scores of 
the experimental and control groups. 


6.1.5 Controlled Experimentation 


The principle of controlled experimentation is to test several groups (minimally, an 
experimental group and a control group), isolating only the variables that we want to 
study. This involves making a list of the variables, ensuring that they are separate, 
and varying them one by one, while the other variables are fixed (remain at constant 
values). The only difference between the experimental group and the control group 
should be the independent variable whose effect is being measured. 

The application of the independent variable to the experimental group produces 
an observable effect called the dependent variable. Controlled experimentation is 
thus a way of relating the values of an independent variable to those of a dependent 
variable. In order to increase the statistical validity of experiments, randomized 
experiments are increasingly used. In these experiments, an attempt is made to 
minimize bias by involving as much chance as possible in the placement of partic- 
ipants in the two groups. This is usually simple random assignment, but there are 
also strategies for combining randomization and matching (e.g., by pooling two 
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individuals with the same control variables who are then randomly assigned to the 
two groups). 

Although randomized experiments were conceived by Fisher in the 1930s, they 
were first applied in medicine and were not generalized to the social sciences until 
much later. Historically, controlled experiments were conducted on small groups in 
which randomization was not considered feasible. Randomization only came to the 
fore in experiments conducted on large groups. 

A controlled experiment of this type was imagined by Samuel C. Dodd (1956) on 
the theme of the diffusion of information in an urban environment. This research, 
financed by the Human Resources Research Institute of the United States Air Force, 
was carried out at the Washington Public Opinion Laboratory. It aimed to determine 
the factors that influence the dissemination of information. The experiment was 
carried out in four American cities of comparable size (1000 inhabitants), whose 
identity was not revealed. These cities were matched based on a dozen ecological 
and demographic characteristics listed in the 1950 census, but also on the basis of 
their economy and local history (ibid., p. 426). This research meets the definition of a 
controlled experiment. 

The initial hypothesis was that a message spread from one person to another 
diffused differently depending on its content: an important subject in the news 
should spread more quickly and widely than information on a timeless subject. 
Dodd’s experiment corroborated the hypothesis that information spread more 
completely and quickly if individuals perceived the information as important. 

It is possible to criticize this direct experimentation in a radical manner, in the 
sense that the criticism stems from an analysis of the conditions under which the 
experimentation was carried out. Although Dodd had taken the experimental control 
procedure quite far (pre-selection of the four cities on the basis of the most recent 
census data; selection on the basis of additional criteria including the type of 
economy and local history; selection of starters [individuals selected to initiate the 
diffusion of information] working in the same occupations; and thus choice of 
occupations represented in the four cities), it is clear that the entire diffusion process 
was driven by the initial choice of the starters. The more the starters in the four cities 
differ? from each other, the more likely the choice of No. 2 is different; the more the 
No. 2 differs, the more likely the choice of No. 3 will be different; and so on. The 
choice of the starter is therefore decisive since it determines not only the choice of 
the first individual to whom the message is addressed, but also determines the entire 
chain of the dissemination process. Given these conditions, it is imperative that the 
starters be carefully matched on a maximum of variables. In Dodd’s experiment, 
however, the starters were matched based solely on occupation, leaving out many 
variables such as age, gender, political or religious affiliation, which were likely to 
influence message diffusion. Furthermore, Dodd’s experiment had an overall bias in 
that the professions of the peer starters were all members of public professions. The 
results thus inform us, strictly speaking, about the dissemination of information by 
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individuals in professions in contact with the public, and not about the ways in which 
information is disseminated by all individuals in the overall society. 


6.1.6 Laboratory Experimentation 


Among the forms of experimentation known in sociology, we should finally mention 
laboratory experiments. These aim to produce knowledge under controlled condi- 
tions, like classical controlled experiments, but operate in a particular context, either 
by studying social interaction between human beings in the laboratory, or by 
modelling aspects of social behaviour. 

Historically, this form of sociological experimentation is more recent than the 
previous ones, both in its conception and in its implementation. It has fallen prey to 
the impossibility argument of Comte and Durkheim, mentioned above, according to 
which sociology could not resort to direct experimentation. 

Durkheim’s judgment has often been understood as the literal impossibility of 
conducting laboratory experiments. This is how, for example, Jean-Michel Berthelot 
commented on the passage in the Rules: 


Sociology can neither reproduce phenomena in the laboratory like the physicist nor create 
new ones by subjecting an element to the action of various factors like the physiologist or the 
psychologist of the time (Berthelot 1995, p. 18). 


However, as the subsequent development of experimental research has shown, 
artificial experiments are not impossible in sociology. Durkheim considered them 
so only because he had in mind a holistic sociology in which it was a question of 
causally explaining a global state of society by a previous global state of society. 
From the holistic point of view, one cannot put a society into a laboratory, which is 
enough to condemn artificial experiments. But with the emergence of other currents 
in sociology, such as interactionism and methodological individualism, this objec- 
tion has gradually faded. Indeed, as soon as the sociologist focused on small groups, 
artificial experimentation became feasible. 

Stemming from the research of Alex Bavelas and Harold Leavitt on small groups 
(Bavelas 1950; Leavitt 1951), social experiments on artificial groups were intro- 
duced in the 1970s (Webster Jr. and Kervin 1971; Bonacich and Light 1978) based 
on the argument that, since all experimentation has an “artificial” character, there is 
basically no a priori division between the artificial and the natural, and there is 
nothing that cannot be artificialized within the framework of an experiment. This 
trend has developed in the last few decades and the state of the art can be found in the 
collective volume, Laboratory Experiments in the Social Sciences (Webster Jr. and 
Sell 2014). We offer here an example of laboratory experimentation belonging to the 
category of “experiments in which human beings interact in the laboratory”. 

A distant echo of research on small groups, graph theory and sociometry (Moreno 
1954), there is today a whole field of research on the conditions and modalities of the 
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construction of social networks. The experiments are generally conducted based on 
computer-mediated social interactions. 

Linda Molm, from the University of Arizona, has been interested in the relational 
structures that foster the emergence of social solidarity (Molm 2007). Her “Building 
Solidarity through Generalized Exchange” (Molm et al. 2007) falls into the category 
of randomized controlled laboratory experiments. Social interaction networks are 
distinguished according to whether reciprocity is direct or indirect, whether the flow 
of benefits is one-way or two-way. This experiment generally corroborates the 
explanatory mechanisms put forward as hypotheses of the genesis of social solidar- 
ity: by moving from negotiated exchange (N) to reciprocal exchange (R) and 
generalized exchange (G), the indicators of social solidarity increase and conflict 
decreases. The results, obtained on differential semantic scales, were subject to 
statistical control. The study is very robust and easily tolerates modifications, such 
as the sample size or the use of psychological indicators to measure social solidarity, 
which is a sociological concept. 

The study raises the question of the sample size on which the experiment was 
conducted. A total of 308 people participated in this study. The authors used a 3 x 2 
factorial design mobilizing triads and tetrads, with N = 10 networks in each case 
(Molm et al. 2007, p. 219, p. 227). These data allow us to calculate the number of 
individuals involved: we have 10 triads N, and 10 tetrads N, 
ie. (3 x 10 + 4 x 10) = 70 individuals; the same number applies to triads and 
tetrads R, triads and tetrads G, which gives a total of 3 x 70 = 210 individuals (out of 
308). This number is confirmed by Table 6’s legend: “Responses of the individual 
subjects (210 total) in the 60 networks” (ibid., p. 234). The number of individuals in 
each class is therefore 30 or 40, which is slightly below the number that is generally 
accepted for statistical purposes. Sixty or 80 individuals in each case (420 individuals 
in total) would have consolidated the results. This remark is in no way specific to the 
Molm et al. study, nor even to laboratory experiments: it is acommon criticism in the 
experimental literature. 


6.2. Conclusion 


Criticism of experimental sociological work is part of the scientific process. We 
know nothing absolutely; we know with certainty relative to a certain state of the art. 
This common criticism of scientific results, and of the conditions in which they are 
produced, shows the strengths and weaknesses of experimental research, and pri- 
marily serves to stimulate further research to answer unsolved questions. 
Evaluative and constructive by nature, this critique also allows us, through 
increasing generality, to make a diagnosis of the questions raised at the beginning 
of the chapter, namely the question of the unity of the sciences and the epistemo- 
logical status that sociology should be granted. The first observation is that, even if 
they are not taught, or are taught very little, in France, all forms of experimentation 
are known and practiced in sociology, from sham experiments to controlled 
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laboratory experiments. The argument that experimentation is impossible in sociol- 
ogy is therefore the product of feeble knowledge of the experimental research that 
has been carried out by sociologists for nearly a century. The argument is 
reproduced, so to speak, from text to text, i.e. by drawing from the literature of the 
past judgments supposed to enlighten contemporary sociology. A simple review of 
existing works in experimental sociology — which amounts to hundreds of 
publications — shows the futility of endorsing this judgment. 

Secondly, the obstacles that stand in the way of experimental sociology are no 
different in nature from those faced by experimenters in other disciplines. Thus, the 
problem of sample size is a typical problem that arises, even in large-scale experi- 
ments, when one seeks to draw a conclusion from a small group. Researchers are 
regularly led to declare that, for such and such a subgroup, the result is not 
significant. What the discussion of sample size shows, therefore, is not that sociol- 
ogy is subject to a special regime of scientificity, but rather that it is subject, like all 
the other sciences, to the imperatives and limits of the scientific method that apply 
elsewhere. It is therefore difficult to make an argument for a different epistemolog- 
ical status of sociology from these considerations, just as it is difficult to create an 
argument against the unity of the sciences. That the facts studied by one science are 
of a different nature from those studied by others does not therefore justify the 
emphasis on radically different methods. In any case, this is not the conclusion that 
emerges from this examination of the practice of experimentation in sociology. 
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Chapter 7 ®) 
Experimentation in Economics se 


Herrade Igersheim and Mathieu Lefebvre 


According to Eber & Willinger (2012, p. 8), experimental economics can be defined 
as “the use of experimentation as a method of investigation in economics.” Begun 
after World War II, experimental economics has, from the outset, been the subject of 
much questioning and methodological debate which has in no way hindered its 
spread within the economics community, as its use is now quite common and widely 
appreciated. 

While some practices differentiate experiments carried out in economics from 
those conducted by other sciences, economic experiments nevertheless obey several 
more general rules. The aim is to observe the behaviour of the experimental subjects 
to collect data. But, following Bernard’s (1966) distinction between observation and 
experimentation, the experimenter does more than the observer since s/he will try to 
define, via the experimental protocol, modifications made during the game, the most 
favourable conditions to carry out the research, etc. Moreover, experimental eco- 
nomics shares two fundamental features of any experiment regardless of the disci- 
pline, namely control and replication. Indeed, the control of the experiment and its 
progress make it possible to guarantee the reliability of the results and, above all, 
their correct interpretation: “control is the essence of experimental methodology” 
(Smith 1976, p. 275; cited also by Serra 2012b, p. 25). Replication is also a crucial 
aspect of any experiment in that it allows other researchers to reproduce a similar 
experiment, which renders results robust. 

Consider the dictator game. This experiment, which has been replicated many 
times, aims to identify pro-social behaviour. It consists of studying an extreme 
situation in which a participant, called the dictator, is asked to distribute an initial 
endowment between himself and another participant, called the receiver. The 
receiver cannot intervene and is therefore constrained to accept the dictator’s 
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decision. The anonymity of the participants is guaranteed in such a way that neither 
the dictator nor the receiver can identify the other. In the hypothesis of a homo 
oeconomicus, an egoistic dictator who maximizes individual gain, the dictator has no 
interest in sharing the sum with the receiver. The dominant game-theoretic strategy is 
to keep the entire endowment for oneself. Any transfer to the recipient can only be 
due to an altruistic or social preference motive for redistribution. This simple 
experiment and the many variations that followed showed that, on average, 40% 
of dictators transferred a sum to the recipient. These results revealed an important 
altruistic component in individual economic decisions that goes beyond the standard 
microeconomic hypothesis of purely selfish behavior. This is now considered a 
robust finding. 

In this chapter, after reviewing some aspects of the emergence of experimental 
economics, we detail its principles and practices and highlight the extent to which 
these differ from those employed by psychologists and make it possible to overcome 
several shortcomings. In a fourth section, we offer two examples of games that show 
the complementarity between laboratory and field experiments. 


7.1 A Short History of Experimental Economics 


As Guala pointed out in 2008, and more recently Serra (2012a) and Cot and Ferey 
(2016), the history of experimental economics has yet to be written. Recent work has 
nevertheless greatly advanced our knowledge through oral history (Svorencik 2015). 
Although it is usually mentioned that the first experiment in economics goes back to 
the St. Petersburg paradox, brought to light by Bernoulli (Roth 1995; Serra 2012a; 
Cot and Ferey 2016), it was not until after the Second World War that experimental 
economics emerged as a true sub-discipline of economic science in the wake of game 
theory. Following Serra (2012a), this history can be broken down into four periods: 
(1) emergence, which lasted until the early 1960s; (2) consolidation from the 1960s 
to the 1980s, due, in particular, to the guiding lights, Vernon Smith and Charles 
Plott; (3) takeoff in the 1980s with the growth of experimental economics laborato- 
ries in the United States, the acceptance of studies employing experimentation in the 
most prominent economics journals and the stabilization of a scientific community 
dedicated to the practice; (4) finally, maturity from the 1990s to the present day, 
which has seen, among other things, the recognition of the sub-discipline through the 
award of the Bank of Sweden’s Prize in Economic sciences in Memory of Alfred 
Nobel, half of which was granted to Vernon Smith in 2002 “for making the 
laboratory experiment an instrument of empirical economic analysis, particularly 


'For the record, the St. Petersburg paradox illustrates the fact that a simplistic decision criterion, 
based solely on mathematical expectation, leads to choices that no individual would really make. It 
is based on a lottery game with an infinite expected gain but for which the participants finally accept 
to play only very small amounts of money. 
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in the study of different market structures” (press release: The Bank of Sweden’s 
Prize in Economic sciences in Memory of Alfred Nobel 2002). 

The emergence of experimental economics was marked by several isolated 
initiatives carried out essentially in the United States in various fields of economic 
theory: decision theory, industrial organization theory, game theory. Most of these 
fields are highly multidisciplinary in that they are the result of collaborations 
between economists, psychologists, and mathematicians. We can thus attribute the 
first experiment in decision theory to a psychologist, Louis Thurstone, in the 1930s. 
In the wake of work by von Neumann and Morgenstern (1947), this research was 
pursued further in the early 1950s by Maurice Allais, who proposed to participants in 
the Paris congress on decision making in economics, including Leonard Savage, an 
experimental survey that would eventually lead to the elaboration of the famous 
paradox that was intended to demonstrate the contradictions in the theory of utility 
developed by John von Neumann and Oskar Morgenstern (Allais 1953). At Harvard, 
a statistician, Frederick Mosteller, and a psychologist, Philip Nogee, also worked on 
these aspects (Mosteller and Nogee 1951). In the field of industrial organization, the 
economist Edward Chamberlin, also at Harvard, was responsible for the first exper- 
iments on markets to test their malfunctioning. He developed a protocol for his 
students to test theoretical predictions on a market of perfect competition (Cham- 
berlin 1948). Vernon Smith, then a student, participated in 1952 and continued this 
work after his appointment at Purdue in 1955 (Smith 2002). At the same time, two 
mathematicians from the Rand Corporation, Merrill Flood and Melvil Dresher, 
conducted experiments on game theory and thus gave rise to the famous “prisoner’s 
dilemma”, which presents a situation in which two players end up betraying each 
other even though it would be in both their interests to cooperate. Among the 
pioneers of this sub-discipline, the economist, Austin Hoggatt, was the first to 
have set up an experimental economics laboratory equipped with computers in the 
1960s at the University of California, Berkeley. Betting everything on the equip- 
ment, he failed to gather a team of dedicated adherents (Svorencik 2015). 

The take-off and consolidation phases of experimental economics during the next 
two decades owe everything, or almost everything, to the two fathers of experimen- 
tal economics, Vernon Smith and Charles Plott, Smith’s student at Purdue in the late 
1960s. In 1975, Vernon Smith moved to the University of Arizona, where he set up a 
large, computerized laboratory and continued his experimental work on the market 
question. He attracted students and laid the methodological foundations of this new 
sub-discipline by emphasizing the importance of financial incentives, i.e. the fact of 
paying the students who participated in the experiments according to the decisions 
they made during the experiments. Charles Plott set up a laboratory at Caltech and 
developed the experimental current around the questions of voting and public goods, 
while also playing a key role in the scientific community. According to Svorencik, 
Charles Plott should be seen as “a pivotal figure in the development of experimental 
methodology, in building a community at Caltech, beyond that, in setting up 
experimental laboratories, in fighting with editors and reporters to get experimental 
papers accepted by major economics journals, in pioneering applied experimental 
research, and in providing organizational support for NSF-funded economics 
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research” (Svorencik 2015, p. 40). It is an understatement to say that this period was 
in every way decisive for experimental economics and its growing scientific and 
institutional acceptance: as Svorencik (2015) brilliantly demonstrates, the experi- 
mentalists above all deployed their forces to gain recognition from the wider 
economics community while at the same time avoiding their marginalization by 
choosing, for example, to found their sub-community too early: 


Once the “journal battle” was won in the early 1980s, experimentalists no longer feared 
being marginalized and ghettoized by the rest of the profession and launched the first journal 
devoted to experimental research in 1998 (Svorencik 2015, p. 16). 


The current phase of maturity would prove them right: the Bank of Sweden’s prize in 
economics awarded in 2002, to the economist Vernon Smith and the psychologist 
Daniel Kahneman,” followed the first Handbook for Experimental Economics in 
1995. A second edition in 2015 (Kagel and Roth 1995, 2015), and the very many 
special issues devoted to this method and its principles and difficulties (see for 
example Falk and Fehr 2003; Andreoni and List 2005; Normann and Ruffle 2011; 
Jacquemet and L’Haridon 2016) all signalled a mature discipline. It is now custom- 
ary to speak of the “experimental turn” in economics, which, according to Svorencik 
(2015, p. 31), is defined as “the desire to reconceptualize the relationship between 
economic theory and data collection under controlled laboratory or field conditions 
under the supervision of economists”. 

The next section examines in greater detail the principles of experimental eco- 
nomics that have been developed as ways to overcome the biases and difficulties that 
accompany this type of investigation. 


7.2 Principles and Practices of Experimental Economics 


As mentioned above, Vernon Smith laid the methodological foundations of exper- 
imental economics. According to the terminology he introduced in 1982 (Smith 
1982), an experiment in economics is composed of three elements: the environment, 
the institutions, and the outcome, the first two of which are chosen by the experi- 
menter who designs the experiment. As Jacquemet et al. (2019) have recently 
reminded us, the environment describes the initial circumstances of the experiment. 
Its three most fundamental aspects are the number of subjects participating, the 
specification of the goods exchanged and their initial endowments. In a laboratory 
experiment, the number of subjects is constrained by the physical capacity of the 
laboratory, and the experiment itself may then be comprised of several sessions 


°The development of experimental economics is closely linked to behavioral economics. 
Behavioural economics attempts to explain behaviour that runs counter to the predictions of 
standard models of microeconomic analysis by incorporating non-monetary motivations and 
psychological regularities. Behavioral economics has relied heavily on controlled experiments to 
collect data (see, for example, Svorencik 2016, Serra 2017). 
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where the protocol is repeated with different participants. As for the goods, they can 
be conceived in several ways: money, tokens, experimental currency, real goods, etc. 
Finally, the subjects’ endowments correspond to the resources available to them, but 
also to their individual preferences, the productive resources available to them or 
their roles within the experiment. Secondly, according to Jacquemet et al. (2019), 
p. 7), the institutions “define the functioning of the experimental microeconomic 
system”, i.e. the set of rules that determine the interactions between the subjects, the 
allocation rules, i.e. the consequences of the agents’ choices on the system, and 
finally the rules of the system’s adjustment process, i.e. the way in which the system 
progressively evolves as a function of the choices (possibly repeated) made by the 
agents. Thirdly, the outcome, a function of the environment and the institutions, 
corresponds to observations relating to the actions and decisions of the agents during 
the experiment at the individual or the aggregate level. 

From these three basic elements, the experimenter will vary certain characteristics 
of the environment or institutions to observe the consequences of the modification on 
the result. This is also a key concept in experimental economics namely, treatment, 
which consists of modifying the independent variables that make up the environment 
or institutions in isolation and one after the other to ensure the causal relationship 
between them and the result. But as Serra (2012b) warns us, for a comparison to be 
meaningful, it is necessary that a control be carried out on the subjects before 
(control group) and after the treatment (test group). There are different methods 
for doing this: the procedure allows for intergroup comparisons, i.e. between the 
control group and the test group, while the intragroup procedure allows for compar- 
isons between individuals in the same group, the same group of subjects being 
subjected to one or more treatments. It should also be noted that the proper control of 
an experiment, i.e. the fact that the results are indeed due to one or other of the 
treatments implemented by the experimenter, makes it possible to guarantee its 
internal validity. In other words, the experimental protocol allows us to explain 
the behaviour observed during the experiment. 

Beyond the major principles listed above, which are the true fundamentals of the 
experimental method in economics, a certain number of other principles of “good 
conduct” of experiments have gradually been established, distinguishing experimen- 
tal economics from experimental psychology (Serra 2012b). First, anonymity is 
advocated and employed in most economics experiments’: typically, subjects arrive 
at the laboratory and are seated in front of their computer in such a way as to avoid 
any oral or even visual communication between them. They are thus supposed to 
make their decisions without any social pressure and in complete freedom. For the 
same reasons, it is sometimes desirable to conduct the experiment double-blind, 
i.e. even the experimenter is not in a position to know who is making which decision, 
in order to avoid the “experimenter effect” identified by Rosenthal (1966), according 
to which subjects act more or less cooperatively, in a game of public good for 


Except when the purpose of the experiment is to measure its impact on the subjects. See, for 
example, Andreoni and Petrie (2004) and Samek and Sheremeta (2014). 
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example, if they knew that they were being observed, and therefore potentially 
judged, by the experimenter. 

Second, most experimental protocols provide for a certain number of repetitions 
of the tasks given the subjects. In addition to an argument put forward by Plott that 
this repetition would allow subjects to gradually reveal their preferences with respect 
to the object of the experiment, Serra (2012b, p. 45) also points out that this 
procedure aligns with the economists’ interest in the “equilibrium properties of 
models (i.e. after a phase of adjustment of behaviours by the agents)”. To avoid 
the appearance of reputation bias due to the pairing of the same pairs or groups of 
subjects over several game periods, who thereby get to know each other and how 
each other plays, experimenters generally prefer “stranger” pairing, 1.e., the repeti- 
tion of tasks performed by a subject is done with a different subject or group of 
subjects in each period. When it is specifically a question of testing the behaviour of 
agents in interactions with known partners, a “partner” procedure is applied. 

Third, the issue of context in an economic experiment has been the subject of 
much debate (Loewenstein 1999). Eber & Willinger (2012, p. 19) rightly remind us 
that “psychologists have clearly shown that behaviors depend on context. The 
problem is that each subject has his or her own perception of the context presented 
to them. Consequently, the experimenter loses some control since he cannot know 
the individual differences in apprehension of the context. It is for this reason that 
economists, unlike psychologists, generally choose to decontextualize experimental 
protocols as much as possible.” The choice made by economists in favour of 
decontextualizing experiments, i.e. proposing situations that are as neutral as possi- 
ble for subjects, is therefore the result of a desire to minimize the behavioural biases 
induced by a protocol and/or instructions that are too connotative. But as stressed 
initially by Loewenstein and then by Serra (2012b, p. 48), the absence of context “in 
social science experiments is in fact a context in itself”. 

Fourth, the absence of deception or fabrication is a principle unanimously 
adopted by experimental economists. Contrary to certain currents of experimental 
psychology,’ the manipulation of subjects in economics is strictly prohibited, to 
avoid a new bias, contamination bias, which amounts to subjects who have already 
participated in experiments adopting a different behaviour, knowing that the objec- 
tive displayed by the experimenter is not the true one. The adaptive behaviour would 
therefore distort the results. Contamination bias could extend far beyond this and 
cast suspicion and distrust on any experimental economics approach. 


“Deception is often justified in psychological research to increase control in the experiment. The 
best-known example is the Milgram experiment, which was designed to assess the degree of 
obedience to authority. Participants were made to administer electric shocks to other participants 
on the orders of an authority. The shocks were fictitious and the hurt participants were actors. This 
practice of deceiving subjects to increase control in the experiment has also been criticized by some 
currents of experimental psychology. Ortmann and Hertwig (2002), bridging the gap between 
psychology and experimental economics, have highlighted the methodological costs associated 
with this practice, as deception can lead to suspicion of the experimenter’s intentions and thus 
jeopardize the control of the experiment. 
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Finally, the question of financial incentives is also an inescapable part of exper- 
imental protocols in economics, again constituting “a sharp theoretical dividing line” 
between economists and psychologists (Camerer and Hogarth 1999, p. 7; cited also 
by Serra 2012b, p. 51): 


It is the purpose of an experimental protocol to make monetary gains sufficiently attractive to 
be used as substitutes for subjects’ utilities. If we cannot conclude that this is the case for 
observed deviations from the behavior predicted [by economic theory], then we are not able 
to reject the theory that made these predictions (Harrison 1989, p.761). 


It is therefore a question of ensuring that the subjects are fully motivated and of 
rewarding them according to their performance in the experiment. One of the major 
problems of this stance is the impossibility for the experimenter to impose monetary 
losses on the subjects beyond the amount of the endowment introduced into the 
experiment, which makes studies on risk and/or loss aversion more difficult. More 
generally, it is legitimate to ask whether similar behaviors would be observed if the 
amounts at stake were much higher and were likely to have a real impact. For this 
reason, some researchers have chosen to conduct their experiments in developing 
countries (see Duflo and Banerjee 2009; Alpizar and Cardenas 2016), initiatives that 
do not fail to raise ethical questions.° In this regard, Jacquemet et al. (2019), p. 7) 
warn that “payment procedures, like participant recruitment procedures, must com- 
ply with general ethical guidelines and principles and with the approval of an ethics 
committee and a protocol review committee, both of which are specific to the 
institution in which the experiment is being conducted or to which the researchers 
in charge of the protocol belong.” For example, experimental economics laboratories 
commonly have a specific policy in terms of minimum payment for subjects per 
session, compensation for extra attendance, etc. 

To complete this section devoted to the principles and good practices of exper- 
imental economics, we must recognize that despite all the precautions taken by 
economists to minimize bias and ensure that the results truly answer their research 
question, certain difficulties remain, such as those relating to self-selection bias, 
whereby only voluntary subjects participate in experiments, which constitutes a 
problem in the interpretation of the results. One way of ensuring that the results of 
experiments are reliable has recently been opened by neuroeconomics, which makes 
it possible to identify the brain activities that are at the origin of this or that type of 
behaviour and, in so doing, to eliminate the other effects observed when the 
decisions of subjects are analysed in a more traditional way.° 

Another important question remains as to the relevance of the results obtained 
within the framework of experimental protocols carried out in the laboratory. If the 
problem of the internal validity of the results can be at least partially mastered 
through the various measures taken by economists, the same cannot be said of the 


Thus, it may be ethically questionable for experimenters in rich countries to conduct experiments 
in poorer countries, thereby using the difference in purchasing power to “inflate” the monetary gains 
of subjects in the latter countries, possibly forcing less altruistic, more acquisitive behavior, etc. 


°For recent work questioning this new method, see, for example, Vallois (2012) or Serra (2016). 
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external validity of the results. In other words, are the results and conclusions 
concerning the behaviour of the subjects valid outside the laboratory, particularly 
in view of its controlled environment and the artificial situations it stages? It has thus 
been stressed that there is a tension between these two validity requirements 
(Loewenstein 1999; Harrison and List 2004; Guala 2005; Serra 2012b), even if 
this tension must be strongly qualified by the fact that external validity cannot be 
considered without its internal counterpart, as Fiorina and Plott indicated as early as 
1978 in their seminal article on experiments relating to the question of voting: 


We are fully aware of (and even share) the skepticism of the discipline as to the usefulness of 
experimental methods. ... If the success of a model in the laboratory does not in any way 
imply its success in the field, a failure in the laboratory raises serious doubts about its 
applicability in field studies. Thus, while we reject the idea that the laboratory can replace 
field researchers, we argue that it can help them decide which ideas are worth pursuing 
(Fiorina and Plott 1978, p. 576). 


The next section questions and foregrounds the richness of the interaction between 
these two types of experiments. 


7.3 Laboratory Experiments and Field Experiments: Two 
Examples 


Field experiments allow us to complete the data collected in the laboratory and thus 
enrich the analysis of the behavior of agents in each type of situation by using, for 
example, different audiences (professionals, etc.) and/or by varying the methods. 
According to the now classic categorization proposed by Harrison and List (2004), 
we can list the three main ones: (1) the artefactual field experiment, which is a 
replication of the laboratory experiment, but with different subjects such as bankers, 
children, etc.; (2) the contextualized field experiment, to which contextual elements 
are added in relation to the target audience; (3) the natural field experiment, 
according to which the subjects participate in the experiment in their usual environ- 
ment, sometimes without even being aware of the experiment. For the latter, and to 
ensure control of the experiment, the experimenters favour a process of random 
assignment of subjects within the groups. Let us specify that this third type of 
experiment was not conceived as an experiment, contrary to the previous categories, 
but nevertheless allows the researchers to exploit the situation in this way a 
posteriori. In this sense, this third type of experiment is closer to experimental 
reasoning than to experimentation in the strict sense (Bernard 1966). In general, 
these different types of non-laboratory experiments have developed considerably in 
recent years (for a relatively recent review of the literature, see Levitt and List 2009). 

We propose here two research questions that deploy these experimental methods 
(laboratory and field) in a complementary manner. The first focuses on voting and 
more specifically on the impact of different voting systems on the way individuals 
express their electoral preferences. In a laboratory experiment, van der Straeten et al. 
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(2010) studied the importance of strategic voting when confronted with different 
voting rules. The goal is to test whether the behaviors observed in the laboratory are 
consistent with the predictions made by rational choice theory and, in particular, to 
see whether the complexity in terms of strategic possibilities induced by the voting 
system has an impact. In this experiment, participants are asked to choose among 
five candidates ranked on an axis from left to right. Each participant is initially 
randomly positioned on the same axis and knows that the gains he or she will obtain 
in the experiment will be greater the closer the elected candidate is to him or her on 
the axis.’ Four treatments corresponding to four different voting modes are consid- 
ered: first-past-the-post, second-past-the-post, single transferable vote, and approval 
voting. Each of these four voting systems has a varying number of strategies. For 
example, in the case of a two-round or transferable vote system, there are many 
options for choosing a favourite candidate, complicating the task of the voter. In 
contrast, in the case of a first-past-the-post vote, the strategic calculations are more 
obvious. Van der Straeten et al. (2010) show that in cases where voting systems offer 
too many strategic considerations, voters rely on simple heuristics such as voting for 
their preferred candidate. When voting systems are more amenable to strategic 
calculations, subjects do not hesitate to make extensive use of them. 

Enriching the findings from the laboratory, the field voting experiments, like 
contextualized field experiments according to the categories stated above, aim to 
interrogate electoral preferences using different voting modes as close as possible to 
areal voting context by setting up alternative polling stations at the exit of the official 
polling stations and inviting voters to participate in them after having accomplished 
their electoral duty. These field experiences largely qualify the importance of 
strategic voting. Since the beginning of the 2000s, a stream of research focused on 
the experimentation of alternative voting methods on the margins of the first round of 
the French presidential elections has clearly highlighted the fact that voters wish 
above all to express their true electoral preferences to a greater extent, beyond what 
the official ballot allows, and to do so do not hesitate to break with a purely rational 
approach. Thus, contrary to what theory predicts (Nufiez and Laslier 2014), they 
abundantly allocate intermediate evaluations to the different candidates in the race, 
ie. evaluations or scores that correspond neither to the maximum nor to the 
minimum score, whereas theory predicts that only these two types of scores or 
evaluations will be used by voters who would be reluctant to give negative evalu- 
ations to certain small parties (Baujard et al. 2014, 2018). This type of experimen- 
tation therefore offers rich lessons for those who wish to better understand the 
effective properties of new voting methods, the only way to be able to conceive 
them as true alternatives to current voting methods. 

The second research problem consists in the analysis of strategic decisions in the 
context of social dilemmas. Behavior in these situations has been studied in the 
laboratory through the mechanism of voluntary contribution to the production of a 


7Participants are paid 20 euros minus the distance between the elected candidate and their position 
on the axis. 
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public good. In this type of experiment, a group is formed from a number of 
participants who each receive an initial endowment. They must then decide how 
much they wish to contribute to a public good. A public good is a good from which 
all members of the group will benefit regardless of their contribution. This type of 
experiment corresponds to a social dilemma situation: it is individually rational for 
each participant not to contribute to the financing of the public good, whereas the 
socially efficient solution is to contribute an amount proportional to their initial 
endowment. This very simple experiment makes it possible to measure the degree of 
cooperation, but also to highlight the “free rider” phenomenon that poses so many 
problems for the financing of public infrastructure or environmental protection. It 
also makes it possible to analyze what type of incentive mechanisms increase the 
contribution to the public good (Chaudhuri 2011). Among these mechanisms, it has 
been possible to show the positive effects on contribution of different institutions 
such as punishment mechanisms (Fehr and Gichter 2000), now well known as 
“nudges” that encourage individuals to adopt one choice rather than another 
(Capraro et al. 2019), or the composition of groups (Page et al. 2005). 

In parallel to these laboratory experiments, voluntary contribution games have 
also been conducted in the field with populations of different professional qualifi- 
cations or geographic backgrounds. Gneezy et al. (2016) used the voluntary contri- 
bution mechanism in an artefactual experiment with Brazilian fishermen. Based on 
the observation that fishermen who work on a lake tend to work alone while those 
who operate at sea work in teams, they wanted to see if the way work is organized 
has an impact on the contribution to the public good. Their results show that the 
organization of work influences the norms of cooperation and that sea fishermen tend 
to cooperate more in financing a public good than lake fishermen. 


7.4 Conclusion 


We have come to the end of this brief overview of this recent research method, 
experimental economics, which has progressively gained ground in the community 
of economists since the second half of the twentieth century to the point where some 
speak of an “experimental turn” in economics. 

For most experimental economists, however, there is no question of abandoning 
economic theory in favour of an exclusively experimental approach. To the contrary, 
this is what Eber and Willinger (2012) and Serra (2012b) highlight when, following 
Roth (1988), they set out the different issues of experimental economics. One of the 
principle aims of experimental economics is indeed to test the theoretical predictions 
offered by economic modelling, not with the aim of rejecting it outright if it does not 
conform to the observations collected, but to clarify the hypotheses and their scope, 
to distinguish between competing theories, to validate or qualify certain aspects 
relating to individual preferences, etc. Secondly, experimental economics aims to 
produce new knowledge leading to the reversal of the hypothetical-deductive 
method, traditional among economists, in order to adopt an inductive approach, 
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from experimental observations to the elaboration of new hypotheses, such as the 
fact of no longer considering the absolute utility of an individual — which depends 
solely on his or her own situation -, but the relative utility — which also integrates the 
situation of others. Thirdly, one of the aims of experimental economics usually 
invoked is decision support, which is similar to “economic engineering”: the econ- 
omist’s task is to suggest to public authorities or private organizations more effective 
ways of solving a specific problem (such as the use of nudges in the fight against 
obesity or the protection of the environment, or the role of social interactions in the 
fight against tax evasion). 

Beyond these three clearly identified and claimed objectives, the experimental 
method in economics can of course be used more extensively, either in specific areas 
of economics or in pedagogical aspects. It is now, for example, well known and 
widely accepted that having students play games in economics courses makes it easy 
to understand the central concepts of economics (Eber 2003). More recently, the 
popularity of field experiments in development economics — recently confirmed by 
the award of the 2019 Bank of Sweden Prize in Economics to Michael Kremer, 
Abhijit Banerjee and Esther Duflo — also contributes to maintaining the now central 
status of experimental economics within the discipline while inviting us to continue 
questioning the links between theory and experimentation. 
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Chapter 8 
Experimentation in Management Science = gi’ 


Vincent Helfrich 


Like economics, the management sciences were late to integrate experimentation 
into the range of methodologies used to account for an aspect of reality. Even today, 
this methodology remains marginal in management sciences research, compared to 
its more significant development in economics. What difficulties prevent a more 
significant deployment of this methodology? Do we find the classic experimental 
difficulties and biases, that have been documented in the social sciences and 
humanities? How can one conduct an experiment in management? 

This chapter proposes to provide some answers to these questions by discussing 
the experimental perspectives in management sciences. We will discuss these 
perspectives in relation to the view of experimentation, proposed by the philosophy 
of science literature (Dupouy 2011; Hacking 1992). Our study is based both on the 
analysis of commonly used books or articles on research methodology in the 
management sciences, and on our own experimental practice in intervention- 
research projects. 

In the first part of our study, we present the general context of research in 
management sciences around its heterogeneous epistemological program. We will 
also specify the knowledge project of management sciences, built around the study 
of management devices (Dumez 2014) and the imperative of access to the field. In 
the second part, we will present the place of experimentation in management science. 
More specifically, we will discuss two forms of experimentation found in this field. 
The first is inspired by the protocols of experimental economics and constitutes a 
form of “laboratory experimentation”. The second, found in the various forms of 
action research or intervention-research, approximates “field experimentation’. 
These two forms of experimentation illustrate what call the experimental continuum 
of management sciences. The latter is part of the epistemological, methodological 
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and axiological continuum of this discipline (Thiétart 2014). The third part of the 
study discusses experimental practices in management sciences in terms of their 
experimental reasoning, their methodological, ethical and political difficulties, as 
well as their potential biases. 


8.1 The Context of Management Science Research 


In this section, we will present the general context of research in management 
sciences to understand the conditions and constraints under which experimentation 
has been able to establish itself as a research practice. We will highlight the influence 
of other social sciences on this young discipline that originated in economics. We 
will also highlight the specific knowledge project of the management sciences and its 
applied nature that also influence the possible forms of experimentation. 


8.1.1 Management Sciences: A “Pre-science” in Perpetual 
Evolution 


Management sciences can be considered as recent sciences within the context of the 
history of social sciences. In many respects, management sciences are inspired by 
theories and methodologies found in economics, sociology, and psychology. Man- 
agement sciences bring together a heterogeneous set of research themes and prac- 
tices (Le Moigne 1988), with respect to the various management disciplines 
(strategy, marketing, human resources, supply chain, finance, etc.). As such, man- 
agement sciences are not really unified (David et al. 2012, p. 13). They retain 
different epistemological sensibilities, different methodologies, different targeted 
journals and communities, and different forms of teaching. One finds in this hetero- 
geneous configuration the characteristics of a “pre-science” (Kuhn 1962) or a 
“pseudoscience” (Le Moigne 1988) that seems to constitute a definitive state in the 
case of management sciences, as in many social sciences. In this context, the “quest 
for legitimacy” and the mimicry of older sciences, such as economics, structure the 
research practices in management sciences. The dynamics of structuring in the 
different management sciences are the subject of specific epistemological reflections 
(Allard-Poesi and Perret 2014; David 2000; Dumez 2013; Martinet 1990). 

The epistemological stances observed in research lie along a continuum between 
a form of positivism and a form of radical constructivism (Thiétart 2014). This 
variability influences the nature of the methodologies employed, between quantita- 
tive “correspondence theory of truth” research and qualitative or comprehensive 
“adjusting theory of truth” research (ibid., p. 36). The axiological claims of the 
management sciences are, for the same reason, part of a continuum between “auton- 
omy” and “performativity” (ibid.). 
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The epistemological, methodological and axiological program of the manage- 
ment sciences reflects properties of scientific paradigms (Kuhn 1962). Indeed, there 
is akind of common knowledge about how to pose problems and solve them through 
the various methodologies of field study. There is also a form of “initiation into the 
paradigm” through management courses, the expectations of academic journals, 
doctoral theses, etc. On the other hand, like other social sciences, management 
sciences experience a coexistence of paradigms, through the continuum presented 
above and the heterogeneities that they generate. This state of “pre-science’” (ibid.) 
favors an incommensurability between the different approaches. Moreover, the 
specific object of study of the management sciences, company management devices, 
favors an extra-scientific hybridization of its paradigms with the field (companies, 
policies), as if the learning of a paradigm was also done through the field. 


8.1.2 A Knowledge Project around Management Devices 
and the Imperative of Application 


Despite the heterogeneity of management disciplines and their practices, there is a 
common denominator for management sciences around a project that aims at 
understanding the construction and functioning of management “devices” in a 
context of an imperative of application: 


Management as a scientific discipline is interested in devices or arrangements and their 
performance. It analyzes the conception, implementation, operation, and termination of 
arrangements and seeks to highlight the conditions in which they succeed [...] and those 
in which they fail. [...] It is probably the first social science to tackle this question of the 
descriptive/normative so head-on [...] and the first to take such a close interest in the 
question of devices (Dumez 2014, p. 67). 


This study of management devices and arrangements makes it possible to understand 
how actors make their choices in a business context. In this regard, management 
sciences are complementary to economics, which study the construction of these 
choices rather than their implementation. Management research is thus an applied 
branch of economics, which has been structured and rendered partially autonomous 
since the 1920s in the United States and the 1970s in France. 

The common project around management devices leads the management disci- 
plines to share a preference for “access to the field”. Thus, research follows devices 
studied in situ or at least responds to an important expectation for the formulation of 
“managerial recommendations”. These recommendations will be immediate, even 
confused with the object of the research, in a context of action research. They will be 
different in a context of purely theoretical research (quite rare in management) or 
non-participating observation. The immediacy of the application therefore leads to a 
certain overlap, or even an inseparability, of the operations of formalisation, valida- 
tion and operationalization of the knowledge produced. In this regard, the manage- 
ment knowledge project is sometimes referred to as “epistemology of action” 
(Hatchuel 2005). 
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The common project of management sciences articulated around the study of 
management devices through access to the field — and associated with the observed 
variabilities of the paradigmatic, epistemological and methodological approaches — 
will have an influence on the place and form of experimentation in management 
sciences. 


8.2 The Place of Experimentation in Management Science 


In this section, we review the context of the late emergence of the use of experi- 
mentation in management science, before presenting two types of experimental 
practices that can be observed in this discipline. 

Like economics, the management sciences are not historically experimental. The 
experimental perspectives of these disciplines were, at best, reduced to an exercise in 
observation, which, according to Claude Bernard (1865), forms only a part of 
experimental reasoning. Recent developments in these two disciplines, however, 
seem to bring them closer to the experimental sciences. Nowadays, management 
sciences offer variable forms of experimentation: “Experimentation is a research 
approach often used in management. It can be carried out in the laboratory or in a real 
situation” (Thiétart 2014, p. 172). Thus, management sciences practice both a form 
of laboratory experimentation, which comes as close as possible to Bernardian 
experimentation in the restricted sense (Dupouy 2011), but also field experimenta- 
tion (ibid.), the implementation of which offers more flexibility. 


8.2.1 Laboratory Experimentation in Management 


Management sciences sometimes mobilize laboratory experimentation. This is 
described in the methodology literature as being distinct from investigation and 
simulation according to four criteria: objective, design, data collection and analysis 
(Thiétart 2014). This methodological form must follow some principles: 


Under no circumstances should participants feel compelled to adopt a behavior induced by 
the experimental situation. The researcher’s job is therefore to create the conditions that 
encourage participants to behave as naturally as possible. [...] On the other hand, these 
methods are sometimes over-simplified and can be limited in terms of external validity. The 
results they produce must be analysed with care because their generalisability is limited 
(ibid., p. 273). 


This experimentation is therefore based on the production of data in the laboratory 
with the aim of testing causal relationships. It is built around experimental protocols 
that define the principles of the experimentation, the nature of the groups (test, 
control), the nature of the stimuli, etc. 

To illustrate this form of experimentation, Thiétart (2014) uses the example of a 
study on “moral clarity” bias in professional situations (Wiltermuth and Flynn 
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1999). This study questions the subjective experience of power and its effect on the 
severity of sanctions. The experiment is based on the evaluation of scenarios 
involving ethical dilemmas in professional situations by a panel of employed 
individuals. It is a laboratory experiment because the subjects are questioned on 
theoretical situations and not on real cases from their daily work life. Methodolog- 
ically, the experiment proposes the random constitution of two groups. The first 
group is “conditioned” in a power situation and the second is a control group. The 
main result of this study shows that individuals who perceive themselves to be in a 
power situation analyze ethical dilemmas more radically and undertake tougher 
sanctions than the control group, as if power reduces the ambiguity of judgment 
and confers greater “moral clarity”. 

The formalization of this type of experimentation in the management research 
methodology literature (Gavard-Perret et al. 2012) absorbs most of the elements that 
qualify as experimentation in other sciences (Bernard 1865, 1963; Campbell and 
Stanley 2011). Management research experimentation is explicitly inspired by 
research found in social psychology, with the difference that the context is always 
a professional situation, with the aim of drawing managerial recommendations from 
the experiment. This formalization also refers to the specific problems of the 
humanities, whose object of study is an “interactive kind” (Hacking 2008) that can 
alter how experimental principles are respected through various means. 


8.2.2 Field Experimentation in Management 


The word “experimentation” can have a much broader meaning in management 
sciences: “Management sciences have a wide range of epistemological positions. 
[...] All forms of logical reasoning [. . .] coexist, allowing for various experimental 
methods” (Lesage 2000, p. 74). Other practices are thus qualified as experimental in 
this discipline. Quasi-experimentation (Campbell and Stanley 2011), also used in 
sociology, relaxes the rules of group constitution (non-random or without control 
groups). Experiments are also possible in the context of intervention-research. 
Intervention-research is a form of action research (Lewin 1946) used in manage- 
ment. It is built on the foundations of the major currents of participatory research 
such as action research, action science and decision support science (David 2012). 
The aim of intervention-research is to combine the design, deployment, and study of 
management devices in a single project where: “The objective is to understand in 
depth the functioning of the system, to help it define possible trajectories of 
evolution, to help it choose one, to carry it out, and to evaluate the result” (David 
2000, p. 12). From an epistemological perspective, intervention-research promotes 
immediate “cross-fertilization” between the field and research, with the goal of 
transferring and creating knowledge between these two spheres (Helfrich et al. 
2018). 

The experimental character of action research has already been analyzed by the 
founding authors: “Lewin places his approach [of action research] at the crossroads 
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of experimental psychology, experimental sociology and experimental cultural 
anthropology” (David 2000, p. 4). Chein et al. (1948) mention an experimental 
variation in their typology of action research (Goyette and Lessard-Hébert 1986, 
p. 152). De Bruyne et al. (1974) include action research in field experimentation 
devices along with experimentation and quasi-experimentation. 

In concrete terms, experimentation in the context of intervention-research gener- 
ally involves introducing a new management mechanism into the organization 
(measurement tool, managerial practice, management system, etc.) and observing 
and documenting its implementation, its effects, or its appropriation by the actors. 
The notions of “control group” and “test group” are more difficult to implement, 
even though it is sometimes possible to conduct a comparison between two sub- 
sidiaries of a company, only one of which adopts the system. 


8.3. An Experimental Continuum 


In line with the heterogeneous epistemological program of the management sci- 
ences, the use of experimentation follows the path of what we call an “experimental 
continuum” (Helfrich and Weber 2021) between experimentation in the strict sense 
(laboratory) and extended experimentation (field). In this section, we discuss this 
continuum according to the principles of experimentation in the human sciences 
(Dupouy 2011). 


8.3.1 Qualification of the Continuum of Experimentation 
in Management Sciences 


The different forms of experimentation found in management sciences favour 
experimental reasoning (Bernard 1865), combining observation and experimenta- 
tion. Whatever the nature of the experimentation proposed in management science, 
however, it will always be affected by the imperative of accessing and influencing 
the field, because this lies at the heart of the knowledge project of this discipline. 

Laboratory experimentation in management science is associated with the inten- 
tional modification of the object of study, with the objective of testing hypotheses in 
a controlled space. In this case, the act of observation is limited to the passive 
collection of data with or without the assistance of various artifacts. Field experi- 
mentation in management sciences involves the researcher more directly. As a result, 
“the distinction between the researcher and the system he observes, which is very 
clear in the classic experimental or non-participatory observation approaches, 
becomes more complex in intervention-research” (David 2000, p. 18). 
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The imperative of research action in the field blurs the conceptual boundary 
between observation and experimentation. Experimentation is never completely 
under control and observation is rarely without effect on the object of study: 


But the question of the experimenter’s power arises: on the one hand, s/he is excluded from 
interfering with the practical objectives of the organization in which the experiment takes 
place, but on the other hand, the experimenter must have the latitude to vary a sufficient 
number of factors (ibid., p. 4). 


The continuum of experimentation in management sciences allows for the applica- 
tion of experimental reasoning to deal with non-experimental data according to a 
methodology that can be experimental, as is sometimes the case in the social 
sciences (Berthelot 2001, p. 21). It also makes it possible to carry out experiments 
around management devices and systems without a strictly experimental methodol- 
ogy. This strategy thus attenuates the opposition between experimental and inter- 
pretative reason (ibid., p. 218). This broadened approach to experimentation is also 
in line with the principles of the epistemology of action of the management sciences, 
which imply the impossibility of separating formalization, validation and 
operationalization (Hatchuel 2005). 

This experimental continuum makes it possible to identify communalities 
between the different experimental practices discussed, a sort of normative or 
methodological basis for an “experimental culture” (Galison 2002) of management 
sciences. This continuum also makes it possible to establish differences arising from 
the irreducible variability of this experimental culture. Thus, the experimental 
culture used by the two forms of management experimentation is based on the 
idea of “voluntary and direct action” on the object of study with the aim of 
simulating situations of use of management systems. The internal variability of 
this experimental management culture is based on the level of control and the 
form of the simulation. In a field experiment, the possibility of control will be 
more limited than in a laboratory experiment. Action on the object of study will be 
direct and confused with the experimental project, preventing any pre-established 
sequence between observation, theorization, and experimentation. 


8.3.2 Difficulties and Biases in Management 
Experimentation 


The management science experimental continuum is subject to the same methodo- 
logical, ethical, and political difficulties and biases as all experiments in the human- 
ities (Dupouy 2011). 

As far as methodological difficulties are concerned, no experimental method used 
in management science can fully satisfy the three conditions of Bernardian experi- 
mentation in the restricted sense. 

The complexity of the object of study does not allow for the “isolation of vari- 
ables” (condition 1) in a laboratory or field experiment in management. Field 
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experimentation is incompatible with this principle because it is confronted de facto 
with a multitude of variables that influence the effects of the management system 
studied in situ. Moreover, these variables are constantly changing during the exper- 
iment. Laboratory experimentation, although more controlled, does not completely 
overcome this difficulty. In Wiltermuth and Flynn’s (1999) experiment, the “condi- 
tioning” of the test group in a situation of power does not completely erase the real 
experience of the people in this group in their real functions and professional 
positions. The same is true for the “unconditioned” control group, whose members 
will nevertheless be able to make their decisions by inference from their actual 
experiences. 

The “manipulation of variables” (condition 2) is present in the different forms of 
management experimentation, but it is difficult to manipulate each variable inde- 
pendently of all the others. In field experimentation, the manipulation of variables is 
based on the introduction of the studied device or system into the firm, but the 
complexity of the observed phenomena does not allow us to guarantee a control of 
the manipulation. At best, researchers will have to content themselves with a detailed 
description of the processes at work, in the spirit of a “comprehensive approach” 
(Dumez 2013, 2016) to the management system. In laboratory experimentation, the 
manipulation may rely on conditioning, as in Wiltermuth and Flynn’s (1999) 
experiment, but this remains an illusion of control when compared to experiments 
performed on “natural kinds” (Hacking 2008) such as quarks or rocks. 

The “reproducibility of the experiment” (condition 3) is quite good in laboratory 
experiments such as those of Wiltermuth and Flynn (1999). Strict application of the 
protocol often leads to equivalent results from one experiment to the next. This is 
also the case for classical economics experiments such as the ultimatum game (Giith 
et al. 1982). In the case of field experiments, replication in the same firm is 
impossible because of the irreversible effects of the integration of the management 
device. It is possible, however, to repeat the experiment in several companies or 
subsidiaries of a group, but there will always be a problem of significant variability 
depending on the size of the company studied, its sector, its maturity, etc. 

Regarding ethical and political difficulties, field experimentation poses the most 
difficulties for researchers. Within the framework of this practice, it is nearly 
impossible to voluntarily test management systems that are ineffective, or even 
harmful, for a real company. In the same way, it would be difficult to maintain, 
even for the benefit of the experiment, a “control group”, that did not benefit from a 
system considered to be effective. This subject was debated in Esther Duflo’s 
experiments in Development economics. This problem is overcome in laboratory 
experiments, such as that of Wiltermuth and Flynn (1999), because the actors are led 
to make choices in simulated professional situations. 

The use of deception in the field is problematic for the relationship of trust that 
intervention-research should foster. Indeed, intervention-research is a means of 
monitoring management devices over an extended period (Helfrich et al. 2018), 
which gives access to sensitive data or data that is impossible to obtain through other 
forms of data collection (interviews, survey). The only “deception” that can be 
countenanced in the field is the concealment of the precise purpose of the research 
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or data collection (e.g., diverting a focus group to analyze reactions, the study of 
collective learning in a project). In the context of laboratory experimentation, the 
possibility of deception is theoretically possible. However, the influence of the 
protocols of economic experimentation, which, unlike experimentation in psychol- 
ogy (Ohana 2004; Serra 2012), discourage its use in management science. 

On the question of experimental bias, there is, once again, variability depending 
on the case. In laboratory experimentation, the experimenter attempts to limit his or 
her influence on the participants, whereas in field experimentation, the effect of the 
observer is unavoidable but assumed, and, in intervention-research, even sought. 
Indeed, this influence on the field is at the heart of the knowledge project and the 
methodology. 

The “subject effect” can be quite problematic in both cases. In laboratory 
experimentation on management practices, the problem of the experience of the 
actors, outside the experiment, remains a potential bias, despite conditioning. In the 
context of intervention-research, the researcher is not immune to being “fooled” by 
the actors and the system (Crozier and Friedberg 1977). Companies all have a 
“shadow side” (Dumazert and Helfrich 2017; Griffin and O’Leary-Kelly 2004; 
Vaughan 1999) that is either conscious or unconscious and difficult to identify and 
therefore to analyze. Thus, the intervention phase and the use of the researcher- 
implemented management device may very well constitute a “spectacle” in relation 
to the normal life of the company that may escape the researcher. 

Finally, the implementation of a laboratory or field experiment in management is 
also subject to “selection bias”. The groups tested in a laboratory experiment may 
turn out to be a poor representation of the company’s overall population. In 
intervention-research, the constitution of the company’s project team may also be 
a poor representation of the company’s entire population (for example, in the 
absence of managers or operational staff). Moreover, the company cannot invent a 
key resource that it does not possess for the proper deployment of the system, and 
therefore of the associated experimentation. 


8.4 Conclusion 


Management sciences are by nature applied sciences. They are interested in the study 
of the multiple and varied devices and systems deployed in organizations. The use of 
experimentation in management science is one of its many methodologies for 
studying these systems. It can be used to simulate these systems in the laboratory 
or to observe them in the field using experimental methods. 

Experimentation in management sciences is part of a continuum that reflects the 
heterogeneity of epistemological positions, methodological choices, and axiological 
claims of the different management disciplines. Experimental perspectives in man- 
agement sciences also seem to be greatly influenced by the history of this “pre- 
science” (Kuhn 1962) which has not yet structured a paradigm. For example, the 
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legacy of economics or psychology still seems to be present in certain experimental 
practices in management science, despite the latter’s quest for autonomy. 

In this chapter, we have structured our discussion around cases that reflect this 
variability of experimental methods. On the one hand, laboratory experimentation 
uses all the features of Bernardian experimental reasoning. On the other hand, field 
experimentation, which is more pragmatic and flexible, proposes an adaptation of 
experimental reasoning to deal with non-experimental data according to a method- 
ology that can be experimental, as in other social sciences (Berthelot 2001). 

The classic experimental difficulties and biases in the human sciences are also 
found in the experimental continuum of the management sciences. Questions about 
the influences of the experimenter, the experimental device or the subject are like 
problems in other human sciences. Only the ethical and political difficulties seem to 
be peculiar to the management sciences, notably in the practice of field experiments, 
as in the case of intervention-research. We also observe a specificity of the manage- 
ment sciences concerning their desire to influence the field, which is part of their 
knowledge project. This influence will be indirect and delayed in time, within the 
framework of a laboratory experiment. It will take a concrete form through the 
“managerial recommendations” produced in this research, which can be passed on to 
the organizations through various channels (training, monitoring, professional 
journals). The influence will be direct and confounded with field experimentation 
where the studied device can constitute at the same time the stimulus for the 
experiment and the application in the field. Thus, the imperatives of access and 
influence in the field in management sciences condition different aims for the 
experimental method. The latter constitutes a way of achieving the objective of 
understanding the management devices and systems in situ, before being used to test 
ideas in experiments. 
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Chapter 9 ®) 
Experimentation in Medicine se 


Stéphanie Dupouy 


Since the Ancients, medicine has, of all the sciences, accorded the highest status to 
the notion of experience in its epistemological identity. We need only recall that 
“empiricism” designated a medical school for centuries’ before becoming a philo- 
sophical current. The texts of the Hippocratic corpus (fifth—fourth century B.C.) 
already affirm that the medical art is based in part on experience (empeiria): faced 
with a case, the physician must carefully observe the patient and recall the cases 
observed in the past (noting the differences and similarities of the signs presented by 
the patients), by virtue of which he or she can render a diagnosis and a prognosis 
(Lloyd 1990). From this point on, the medical schools of antiquity differed as to the 
place to be given to causal inference in medicine. For the Hippocratic school, which 
in the third century came to be called “dogmatic”, knowledge of the causes of disease 
and the nature of the body was essential to medical practice (for example, the theory 
of the humours), and this knowledge could only be acquired by combining 
experience with reasoning. Those whom the Greeks called “oi empeirikoi’, the 
“empiricists” (third-second century BC) — a medical school close to scepticism — 
maintained, on the contrary, that theoretical speculations were useless and dubious: 
the only thing that mattered to medicine was the effectiveness of the treatments, 
obtained through observation and the confirmation by repetition of therapeutic 
successes obtained by chance. The physician relied on personal experience, the 
accumulated experience of other physicians, transmitted orally or in writing, and 
on reasoning by analogy (Grmek 1997; Pellegrin 2011). This is the source of the old 
French term “empirique” (in the sense of an empiricist), which became pejorative 
(synonymous with charlatan) only much later. This empiricist genealogy readily 


‘This is still, for example, the meaning of the terms “empirical” and “empiricism” in The Great 
Encyclopedia. 
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extends to the entire history of medicine. Even in the nineteenth century, an old 
commonplace in medical textbooks asserted that all the progress of medicine, from 
its origins, had been made by trial and error: all medicine would then be nothing 
more than a long and continuous experiment, transmitted by medical tradition. The 
opposition between “dogmatists” and “empiricists” permeates the entire history of 
medicine (Wulff et al. 1993; Trohler 2012). 

While the modern concept of “experimentation” is not the same as “experience” 
or “empirical”, these ancient (and competing) conceptions of experience and its role 
in medicine, and this rich conceptual and semantic heritage, saturate the history of 
experimentation in medicine and are polysemous. To say where exactly experimen- 
tation in medicine begins is, therefore, not an easy task. Three meanings of the 
notion will be distinguished here: experimentation as a clinical practice, experimen- 
tation as a trial of a new treatment, and experimentation as a systematic comparative 
trial. The present chapter focuses on both the methodological and ethical sides of 
these different concepts, which are often inseparable, and gives an overview of their 
evolution and their historical and philosophical relationships. 


9.1 Experimentation as a Clinical Practice 


In the broadest sense, medical experimentation merges with the clinic and refers to 
the acquisition of knowledge in and through the care of individual patients. “To care 
is to conduct an experiment”, writes Georges Canguilhem (2002, p. 389), echoing 
Claude Bernard: “Every day the doctor conducts therapeutic experiments on his 
patients, and every day the surgeon performs vivisections on his patients” (Bernard 
2008, p. 252). Each new patient is in fact a “test” for physicians, testing the accuracy 
of their diagnosis, the effectiveness of a treatment they presume to be beneficial, and 
even, in certain cases, the validity of a theoretical hypothesis. Any medical inter- 
vention constitutes, strictly speaking, a form of experimentation, insofar as it is 
carried out in a situation of uncertainty, involving the unforeseen and risk. This first 
sense of medical experimentation is characterized by its proximity (and openness) to 
the old sense of “experience” (empeiria, experientia) in the sense of knowledge 
acquired through practice, habit, or age: traditionally, it is the know-how possessed 
by the “experienced” physician that makes a good clinician. As Bernard says, in the 
same passage: 


An old doctor who has often administered medicines and who has treated many patients will 
be more experienced, that is to say, will experiment better on his new patients because he has 


Recall that the term “experimentation” in French is a neologism that first appeared in 1834 in a 
dictionary of the French language (Grmek 1997, p. 14); the difference between “experiment” and 
“experimentation” nevertheless corresponds in part to the doublet “experientia’/ “experimentum” in 
modern times. 
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learned from the experiments he has made on others. The surgeon who has often performed 
operations in various cases will learn and perfect himself experimentally (ibid., p. 252).° 


Concerning this first meaning of medical experimentation, we refer readers to Jean- 
Christophe Weber’s contribution to this book. 

It should be noted that this clinical know-how, although embodied in an individ- 
ual, is not generally thought of as breaking with medical tradition: the experience of 
the clinician of today lines up with the experience of past centuries. In the other two 
meanings of experimentation that we now examine, experimentation appears as what 
is required when medical tradition is no longer sufficient to treat or to know. 


9.2 Experimentation as a Test of a New Treatment 


In a more precise sense, experimentation begins when the physician, confronted with 
a disease for which the usual treatments are not very effective or do not exist, tries a 
“daring” remedy, which has not yet been tested. This sense of experimentation as a 
trial, an attempt or a test refers to the Greek peira and the Latin experimentum.* 
Strictly speaking, this form of therapeutic experimentation can only be carried out on 
the sick subject, in the context of care. In a slightly broader sense, however, 
experimentation thus conceived can also be carried out on the healthy subject, on 
oneself (and on the physician himself), on an animal, or even on a cadaver, because 
before knowing whether a remedy is effective for a given disease, the physician can 
first try to ensure its harmlessness. 

This sense of experimentation is close to what the other sciences call “experiment 
to see”. But the singularity of this sense of medical experimentation compared to the 
other sciences is that it is perilous and implicates the responsibility of the physician. 
Art is long, life is short, opportunity is fleeting, experience is dangerous, judgment is 
difficult, as the Hippocratic aphorism states.” Experimentation is dangerous because 
it affects the body of the subject of the experiment — in contrast to other sciences that 
manipulate inert and unvalued things. In contrast to experimental devices that 
operate only in the artificial space of the laboratory, or that use fiction to manipulate 
behavior (in psychology, for example), a medical experiment always takes place, for 


Bernard plays on semantic ambiguity here, both to show himself, for once, conciliatory towards 
medical empiricism, but also to suggest that doctors carry out experiments in the ordinary practice 
of medicine, which has the consequence of legitimizing therapeutic experimentation in the context 
of care. 

“In the context of the experimental sciences of the seventeenth century the Latin word 
experimentum acquired a different meaning from that of experientia. See, for example, Francis 
Bacon: “There remains the simple experience (experientia). When it presents itself, it is called 
chance. When it is sought, it is called experiment (experimentum)” (Bacon 2001 [1620], Book I, 
Aphorism 82) 

On this first Hippocratic aphorism, see Chamayou (2008, p. 14 sq.). As well as Grmek (1997, 
p. 115 sq.). 
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the subject of the experiment, in “real life” and occurs in a temporality characterized 
by urgency and irreversibility: 
It is well known that the test (peira) is dangerous, because of the object on which the medical 
art is exercised. Indeed, unlike other arts in which one can experiment without danger, the 
materials of medicine are not skins or logs or bricks; but it experiments on the human body, 


on which it is not without danger to experiment with the inexperienced (Galen quoted by 
Grmek 1997, p. 128). 


Medicine’s awareness of this danger very early on can be seen in the age-old 
practice, studied by Chamayou, of experimenting remedies on “vile bodies”®: 
those condemned to death, prisoners, orphans, prostitutes, slaves, the colonized, 
poor people or those dying in hospitals (Chamayou 2008). “To experiment: to try, to 
experience something. [...] One experiments remedies on people of little impor- 
tance” says Furetiére’s Dictionnaire (1690). Thus, when the Princess of Wales heard 
about preventive inoculation against smallpox (an ancestor of vaccination) in 1721 
and considered having her children inoculated, the inoculation was first carried out 
on six voluntary prisoners from Newgate prison, who were pardoned following the 
experiment, and then on children from St James’ orphanage — which did not trouble 
anyone at the time. The “vile bodies” were used as “doubles” to experiment with 
remedies in place of and for the benefit of the noble or bourgeois bodies. Later, it was 
in hospitals, and not on their private clients, that doctors would experiment with 
dubious remedies.’ 

However, from the nineteenth century onwards, experimentation in the sense of a 
test gradually became the subject of deontological reflection by the medical profes- 
sion. Prominent voices rejected experimentation on convicts, prisoners, children, 
and hospital patients, either in the name of respect, humanity or the care that was due 
to them, or because these practices were unworthy of science.* Some also argued that 
testing a new treatment was acceptable only when done for the direct benefit of the 
subject, which excluded both experiments for scientific curiosity, with no potential 
therapeutic benefit to the test subject, and experiments that may harm the subject by 
exposing the patient to a possibly harmful or ineffective remedy, or by depriving 
them of a proven remedy. The daring trial was seen as morally acceptable in only 
two cases. First, when a patient had nothing left to lose and the outcome was fatal, 
provided that the test was carried out for his or her direct interest and that there was 
no treatment for his disease. Second, tests for diseases that are not very serious, 
whose outcome is not dangerous, and with harmless treatments. Claude Bernard held 


° According to the adage “Faciamus experimentum in corpore vili”. On this locution, see Chamayou 
(2008, p. 8 sq.). 

70On the history of these dangerous experiments and the correlative emergence of the ethics of 
experimentation, see the work of Anne Fagot-Largeault (1985, 1993, 2000), Christian Bonah et al. 
(2003), Bonah 2007, Anita Guerrini (2003), Grégoire Chamayou (2008) and Philippe Amiel 
(2011). The following pages draw heavily on the publications and teaching of Anne Fagot- 
Largeault. 

SHowever, this position was not unanimous: some, like Pasteur, called for the possibility of 
experimenting on those condemned to death (Amiel 2011, p. 66). 
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views close to this when he stated that “among the experiments that can be attempted 
on man, those that can only harm are forbidden, those that are innocent are permitted, 
and those that can do good are ordered” (Bernard 2008, p. 253). The first rule to 
follow, according to Claude Bernard, was therefore that of primum non nocere. 
Secondly, experimenting in order to do good was not only a right but also a duty of 
the physician. Finally, experimenting to know was allowed if and only if the 
experimentation was not disadvantageous to the subject. But how can we know 
whether a remedy is likely to be beneficial or detrimental to a patient? That was the 
question. The physician alone was the judge. For Bernard, as for almost all physi- 
cians of his time, the question of information and consent of the subject did not arise. 
In this perspective, not only could medical science take advantage of therapeutic 
experiments to learn — and it had always done — but it also had the right to practice 
harmless experiments without benefit to the subject, even without the subject’s 
knowledge. 

Physicians of Claude Bernard’s time often affirmed that therapeutic experimen- 
tation, even when limited in this way, should not be done randomly and should 
remain prudent — such prudence being opposed even to possible patient wishes. A 
physician should not experiment with a remedy whose composition he ignored and 
should always be guided by the analogy of diseases and remedies. A trial demanded 
good prior knowledge of the subject of the experiment, a good knowledge of the 
state of the art in order to avoid redundancies and unnecessary risks, a consultation of 
colleagues (and, if possible, a collegial decision), and finally prudence in the dosage 
(Chamayou 2008). Many also demanded that physicians experiment on themselves 
with new remedies before administering them to their patients. Experimentation on 
oneself, very common among physicians in the nineteenth century, was fatal for 
some of them (Altman 1998). At the end of the nineteenth century, it was also 
increasingly recommended that experimentation on humans be preceded by exper- 
imentation on animals: the idea was gaining ground that the testing of a new 
treatment on a patient should be the result of a prior process (and we thus gradually 
approach experimentation in the third sense of the term). 

This evolution of ideas on the ethics of the therapeutic trial, however, had little 
immediate impact on practices. Reckless and dangerous, even criminal, experiments, 
often conducted on vulnerable, non-consenting and uninformed subjects and 
published openly in the medical press, continued throughout the 19th and most of 
the twentieth century. The disregard for patients was compounded by the weight of 
medical dogmatism and paternalism, the belief that bold experiments were always 
the condition of medical progress, or a requirement of the experimental method, as 
will be seen later. 

Nevertheless, the first regulatory texts appeared that begin to formalize this 
ethical awareness. They distinguished clearly between “sanctioned” medical prac- 
tice, in conformity with the state of the art, and the experimentation of new remedies, 
with a risky outcome. Thus, the Medical Code of 1853, in France, required that the 
testing of new therapeutic substances be controlled, and obtain the approval of a 
collegial commission appointed by the administrative authorities. A few decades 
later, in 1900, at a time when the German chemical industry was booming, the 


106 S. Dupouy 


Prussian physicians’ code also stipulated that doctors who suggested that a patient 
try a new drug must without fail warn the patient that the product was new. In 1931, 
the Weimar Republic promulgated “Guidelines [Richtlinien in German] concerning 
new medical treatments and scientific experimentation on human beings”. They 
stipulated that no new treatment could be administered without having first been 
tested on animals, without having assessed the benefit/risk equation, i.e. the balance 
between the expected good and the possible damage — while affirming, like Claude 
Bernard, that these daring trials were a duty of the physician when the failure of 
traditional therapeutic means could be foreseen. 

But above all, the Richtlinien drew a contrast between the therapeutic trial of a 
new treatment, which sought the health of an individual person, and another form of 
investigation called “scientific experimentation”. This brings us to the third meaning 
of the notion of medical experimentation, undoubtedly the most common today, in 
which medical experimentation, contrary to the two previous meanings, is distin- 
guished from care by its purpose: to acquire generalizable knowledge, and not to 
treat an individual patient. 


9.3 Experimentation as a Methodical Comparative Test 


In a third sense medical experimentation designates a research practice that, through 
intervention on individual subjects, seeks to obtain knowledge of general scope 
about diseases and treatments. Individual care is thus no longer the point of the 
experimental intervention, even though it might possibly be beneficial to the subjects 
who participate in the experiment. 

Linked to the experimental method, this third sense of experimentation grew out 
of a critique of the first two. When conducted within the framework of the care of an 
individual, the testing of theories, treatments, and routines remains inconclusive and 
hardly exploitable. While it is easy for a physician to imagine that a patient has been 
cured by the remedy that the physician has prescribed, proving the general effec- 
tiveness of this remedy in this disease is another matter. How can stable knowledge 
be created when sick individuals are all different and are never exactly sick in the 
same way, even when they have the same disease? When doctors vary in their 
diagnostic criteria and therapies? When patients, finally, never react the same way to 
the same treatment? Medicine seeks to find regularities in this variability in order to 
rationalize care (Fagot-Largeault 1989, 1993).? Experimentation is the means for 
this rationalization of the clinic and the essence of this approach is comparison. As 
Claude Bernard said: 


*Littré formulated the problem crisply in his comments on Hippocrates’ aphorisms: “In medicine, 
where an experiment can never be repeated under identical conditions, the experiment is exposed to 
inevitable failures. [...] The infinite variability of the sick subject and the impossibility of repeating 
on the same person a treatment that has ended badly lend a quite particular character to medical 
experimentation.” (Hippocrates, Aphorismi, I, 1, Littré IV 458-59, quoted by Grmek 1997, p. 115). 


9 Experimentation in Medicine 107 


Comparative experimentation is the sine qua non of experimental and scientific medicine. 
Otherwise, the physician sets off on a wild goose chase and becomes the plaything of a 
thousand illusions. A doctor who tries a treatment is inclined to believe that the cure is due to 
his treatment. Physicians often boast that they have cured all their patients with a remedy 
they have used. But the first thing they should be asked is whether they have tried to do 
nothing, that is, not to treat other patients; otherwise, how can we know whether it is the 
remedy or nature that has cured them? (Bernard 2008 [1865], p. 398). 


In this sense, medical experimentation is like (and modelled on) other sciences. 
Medicine seeks to compare experimental subjects “all other things being equal” in 
order to establish facts and evidence concerning the safety and efficacy of treatments 
and the causes of diseases. Again, like other disciplines, it seeks to find a compro- 
mise between two requirements: on the one hand, to extract itself from the com- 
plexities of the clinic and real life, in order to control the parameters and inferences; 
on the other hand, to obtain effective results transposable to the clinic. But here 
again, what makes medicine unique is the fact that it deals with human beings, sick 
human beings, whose integrity must be protected, and for whom care cannot wait. 
Medical experimentation, understood in this way, is caught between two pitfalls: the 
risk of instrumentalizing experimental subjects (or sacrificing the individual to the 
collective good); and the risk, if experimentation is not carried out, of abandoning 
patients to ineffective or harmful traditional treatments, or to “wild”, semi- 
clandestine and anarchic trials. 

Thus conceived, medical experimentation developed historically, especially from 
the nineteenth century onwards, according to two models often opposed, point by 
point, in the history of medicine: comparative clinical trials, and laboratory medicine 
(see e.g. Wulff et al. 1993, Fagot-Largeault 2010, pp. 39-40). The two currents — the 
empiricist and the “numerical” tradition on the one hand, and the Bernard-Pasteur 
tradition on the other — nevertheless share some commonalities. Both were highly 
critical of traditional medicine, and both denounced the ineffectiveness, and even the 
danger, of the therapeutic routines then in use. But they differed in how they 
circumvented the epistemological obstacle posed by individual variability, as well 
as in the ambition (pragmatic or causal) they assigned to research. 


9.3.1 Experimentation in the Empiricist and Numerical 
Tradition 


Let us begin with the empiricist tradition, which developed in the 18th and 19th 
centuries, in England and then in France, among a minority of reformist clinicians 
(Trohler 2005, 2012). Skeptical about medicine’s claim to know the true causes of 
disease, these physicians put the evaluation of the efficacy of treatments ahead of 
medical theory. They advocated trying treatments without dogmatism, by trial and 
error rather than by rational deduction, by combining groups of subjects treated in 
different ways. James Lind (1716-1794), a physician in the English navy, carried out 
a small, carefully planned trial on board a ship in 1747, which has remained famous 
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in the history of medicine, by testing six possible remedies against scurvy (citrus 
fruits, cider, vinegar, vitriol, sea water and pharmaceutical preparations) on 
12 affected sailors from his crew, divided into six groups of two, and found that 
those who received citrus fruits recovered. However, his discovery remained a dead 
letter for decades, as the comparative test was not yet widely used (Fagot-Largeault 
1992). 

Other practitioners emphasized the need for a larger-scale view of therapeutic 
successes and failures in order to overcome the fortuitous and limited nature of the 
cases that individual practitioners encountered, and to develop a clinic less reliant on 
chance. This way of reasoning received a decisive boost in the eighteenth century in 
the context of the debates sparked by the advent of smallpox inoculation — which 
constituted a form of large-scale, albeit uncontrolled, experimentation. Mathemati- 
cians specializing in the calculation of probabilities (La Condamine, Bernouilli, 
D’Alembert) tried, by examining mortality tables and counting the number of 
accidents attributable to inoculation, to determine whether the gamble of being 
inoculated was more profitable than relying on nature (Moulin 1996; Corteel 2020). 

A few decades later, Pierre Charles-Alexandre Louis (1787-1872), a French 
physician marked by the empiricism of the Paris School and an actor in the “birth 
of the clinic” in the Parisian hospitals of the time, became a pioneer of the “numerical 
method”, or the use of statistics in medicine (Armitage 1983). Like Pinel before him, 
he insisted on the need to make medical experience cumulative from patient to 
patient and from practitioner to practitioner by the standardized recording of cases 
and treatment results and by describing patient symptoms in detail to improve the 
stability of diagnoses. But above all, he advocated testing treatments in the hospital, 
on a large (or relatively large) number of patients, by recording the results obtained 
by different therapies. Louis remained famous for experiments, carried out at the 
HO6pital de la Charité between 1821 and 1827, demonstrating the ineffectiveness of 
bloodletting in the treatment of pneumonia (for which it was the most common 
remedy in France at that time). Carefully selecting patients with comparable forms of 
pneumonia, he created groups of patients to whom he administered bloodletting at 
progressively later stages of the disease (which amounted to implicitly not treating 
some of the patients), and then looked to see if the treatment made a difference by 
comparing the number of deaths and survivors in the different groups. Louis’ 
statistics showed that bloodletting kills — even though, strictly speaking, his results 
were not statistically significant. Louis was unaware of contemporary work in the 
calculation of probabilities that would have allowed him to know how many patients 
his study would have had to include to be conclusive. !° Nevertheless, Louis had an 
intuition for the statistical and probabilistic logic at work in a trial of this type: since 
the patients were all different and the happy or unhappy outcome for each patient 


'OThe mathematician and physician Jules Gavarret (1809-1890), author of the first textbook on 
medical statistics (Principes généraux de statistiques médicales, 1840), criticized Louis’s work by 
showing that his study, 78 cases of pneumonia, would have had to include several hundred patients 
in order to be conclusive, and that he should have assessed the risk of error (very high in a sample of 
78 cases) associated with his conclusion. 
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was the result of the joint action of a large number of random (and unknown) causes, 
he said, “we must necessarily count”.'' The combination of a large number of cases 
allows one to obtain groups that tend to differ only in the administration (or not) of 
the remedy. 

Louis’ method aroused unease in the medical community and a vigorous debate at 
the Académie de médecine. Some denounced it as the “medicine of folded arms” 
(Chamayou 2008), others invoked the clinical intuition (“medical tact’) of the 
“experienced” physician in opposition to the standardization of therapeutics that 
the numerical method suggested. But above all, experimentation coupled with 
Statistics radically challenged medical knowledge and authority and subjected both 
to a form of public control that was unacceptable to most physicians at the time (see 
Weisz 1995; Matthews 1995; Jorland et al. 2005). It was not until the middle of the 
twentieth century that this type of reasoning, supported by new tools derived from 
the experimental design methodology of Ronald Fisher (1890-1962) and from the 
mathematical statistics of Karl Pearson (1857-1936), Jerzy Neyman (1894-1981) 
and Egon Pearson (1895-1980) returned in force through randomized controlled 
clinical trials (Fagot-Largeault 1992; Giroux 2011). 


9.3.2 Experimentation and Laboratory Medicine 


Another project of experimental medicine was invented in the mid-nineteenth 
century physiology laboratory. Popularized in France by Claude Bernard’s Intro- 
duction a l'étude de la médecine expérimentale (1865), this rejection of medical 
empiricism asserted that medicine could only become scientific through (experimen- 
tal) knowledge of the (physiological) causes of diseases and of treatment mecha- 
nisms. The clinic was therefore to learn from physiology. Physiology is an 
experimental science because in order to elucidate the causal mechanism of organic 
functions, it is necessary to disturb these mechanisms by experimental intervention 
(or vivisection): sublata causa, tollitur effectus. As organisms are unified totalities, 
experimental intervention implies damaging and destroying the studied organism 
and must for ethical reasons be carried out primarily only on animals. Bernard did 
not exclude, however, as we have seen, that a physician might learn something from 
the “natural experiments” of disease and therapeutic intervention on human beings — 
but, on the whole, the variability of patients and the imperative of care hardly 
allowed clinical reason to assert the necessary “all things being equal’, as the 
physiologist was able do in the laboratory on animals. The logic of physiological 
experimentation was therefore to control biological variability as much as possible 


'l« it is precisely because of the impossibility of assessing each case with a sort of mathematical 


exactness that we must necessarily count. Since the errors, the inevitable errors, are the same for two 
groups of patients treated by different procedures, these errors compensate for each other, and can 
be neglected, without appreciably altering the accuracy of the results” (P. C.-A. Louis, Recherches 
sur les effets de la saignée, Paris, Bailliére, 1835, p. 76). 
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by operating on animal models that are as similar as possible (or on detached parts 
of the body, kept artificially alive), placed in identical conditions, to ensure the 
comparability and replication of results in order to produce evidence and 
counterevidence. 

Understood in this way, physiological experimentation presupposed, according to 
Bernard, the postulate of determinism as a regulatory ideal: in order to search for 
causes and to experiment, it is necessary, Bernard tells us, to admit that phenomena 
are, in principle, entirely determined by their causes, that the same causes always 
have the same effects, and the same effects always have the same causes. Even if 
biological individuals are all different, one must posit the invariability of the 
physiological mechanisms between organisms. Experimentation aims at progres- 
sively attaining the reproducibility of the causal relations through the increasing 
control of the experimental parameters. This deterministic choice and its causal 
ambition explain why Bernard rejected the numerical method. Clinical statistics 
were, he thought, powerless to reveal the true laws of physiology and pathology, 
which are necessary and not merely probable. As long as therapeutic efficacy 
remained probable and not certain, a remedy had not been found. Clinical statistics 
were also powerless to uncover the causes of diseases. They informed clinicians only 
about the effectiveness of the treatment. Medicine would be truly scientific only 
when it explained why a remedy worked, and not simply that it worked: 


Scabies is a disease whose determinism is now more or less scientifically established; but it 
was not always so. In the past, scabies and its treatment were only known empirically. One 
could then [...] draw up statistics on the value of this or that ointment for curing the disease. 
Now that the cause of scabies is known and scientifically determined, everything has become 
scientific, and empiricism has disappeared. [...] there are no more statistics to be established 
on its treatment. One is always cured without exception when one places oneself in the 
experimental conditions known to achieve this goal (Bernard 2008 [1865], p. 428). 


For Bernard, and for many physiologists or biologists of his generation, statistics 
could not be scientific. At best, they had only a provisional and approximate utility. 

What then was the relationship between laboratory science and the medical clinic 
envisaged by Bernard? In principle, physiological experimentation should allow for 
the experimentalization of medicine in that the physiologist could artificially recreate 
(in the laboratory, by vivisection) natural pathologies, then imagine remedies and 
test them in vivo. In practice, physiological research remained far from the clinic 
because human pathologies are complex and physiology was in its infancy. The 
physiologist operated on animals and clinical medicine could not wait for physiol- 
ogy to be completed to begin treatment. Bernard did not seem to be aware of the gap 
that separated the theoretical conception of treatments at the bench in the lab from 
their true bedside effectiveness. In fact, the relationship between physiology and 
medicine remained largely programmatic for Claude Bernard, and it is doubtful that 
he in fact implemented experimental medicine. 

This was no longer the case, however, later in biology — particularly in bacteri- 
ology of the Pasteurian tradition, where vaccines developed experimentally in the 
laboratory led to an effective transformation of therapeutic practices. The identifi- 
cation of disease pathogens and the mastery of techniques to attenuate their virulence 
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seemed at first to realize the Bernardian ideal of a deterministic experimental 
medicine, effective in 100% of cases. The public experiment in Pouilly-Le-Fort, 
carried out in May of 1881 by Pasteur, presented the model. Pasteur thought he had 
found a way to reduce the virulence of the bacterium that causes anthrax (which kills 
flocks of sheep). He experimented with his vaccine on 60 sheep: 25 were vaccinated, 
then inoculated with the virulent disease; 25 were inoculated with the virulent 
disease without being vaccinated, and 10 served as controls (neither vaccinated 
nor inoculated); all the vaccinated sheep survived, all the sheep inoculated without 
being vaccinated died (Fagot-Largeault 2010).'* This deterministic conception of 
causality was also expressed at the time as “Koch’s postulates”,'* which stated that 
in order to establish that a germ caused a disease, it was necessary to have proven 
that the micro-organism was always present in subjects suffering from the disease, 
and was never present in healthy subjects; that the agent could be isolated from an 
infected subject, cultivated in an inert medium, and the disease reproduced by 
inoculating it into a healthy subject. For these pioneers of bacteriology, who had a 
deterministic conception of the etiology of disease, experimentation did not require 
large numbers. Subsequent developments in infectious disease research, and later in 
chronic disease research, would nevertheless show that disease causation, which is 
often multifactorial and depends on host characteristics, is much less simple than 
Koch’s postulates assumed (Evans 1976; Fagot-Largeault 2010; Giroux 2011). Both 
the etiology of diseases and the efficacy of treatments rarely operate on an all-or- 
nothing binary model, hence the now-accepted general need for statistical evidence 
in the life sciences (Schwartz 1994) — including in laboratory experiments (Lemoine 
2017, pp. 100-104). At the end of the nineteenth and beginning of the twentieth 
centuries, the demands of experimental proof led bacteriology in the Pasteurian 
tradition to multiply “pathological experiments”, at the cost of an unparalleled 
moral bankruptcy. 

In order to demonstrate that a particular pathogen was the cause of a particular 
disease, physicians inoculated healthy subjects (often without their knowledge) with 
germs or tested dubious vaccines on subjects whom they contaminated and some- 
times killed. Pierre-Charles Bongrand, a French physician who denounced these 
practices, counted no less than 120 experiments involving more than 600 inocula- 
tions practiced on individuals free of the disease, published in the medical press up to 
1905.'* Because they created scandals, these pathological experiments gave rise to 
an unprecedented ethical awareness that contributed to the introduction of the issue 
of consent (even though the question of what exactly consent authorized was the 
subject of debate). In 1859, the Lyon correctional court (lightly) sentenced two 


'2On Pouilly-Le-Fort, see also Gaudilliére (2006). 

'3 Also called the “Henle-Koch” postulates, in reference to the work of the physician Jacob Henle 
(1809-1885) and his student Robert Koch (1843-1910). On these postulates and the tradition of 
laboratory medicine, see for example Gaudilliére (2006). 

'4 Pierre-Charles Bongrand (1882-1928), De l’expérimentation sur l’homme: sa valeur scientifique 
et sa légitimité (1905). 
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physicians for having inoculated a 10-year-old child with syphilis for experimental 
purposes and pointed out that physicians were not authorized to carry out experi- 
ments out of pure scientific curiosity. The text of the judgement set a precedent in 
France until the adoption of the bioethics law of 1988: until then, experimentation 
was permitted only in the context of a therapeutic justification (or alibi). In 1900, 
following the Neisser affair,'* the more permissive code of deontology of the 
Prussian physicians proscribed scientific investments for purposes other than thera- 
peutic in three cases: on minors and persons deprived of legal responsibility; if there 
was no consent; and if consent was not preceded by information on the risks of the 
intervention. The Richtlinien of 1931, by recognizing the right of physicians to 
experiment for scientific purposes, reaffirmed these principles. After the war, the 
judgement of the Nuremberg tribunal in the trial of Nazi doctors integrated these 
different rules for the first time into an international law on experimentation on 
human beings, distinguished from therapeutic innovation. 

The Nuremberg Code was inspired by a contractualist ethic based on the auton- 
omy of the subject of experiment (obligation of free and informed consent, freedom 
for the subject to interrupt the experiment at any time, prohibition of experimentation 
on minors and persons incapable of consenting), an ethic of the protection of the 
person (impossibility of undertaking experiments that risk killing or injuring, obli- 
gation to avoid all unnecessary suffering, the need for previous studies on the 
animal, the obligation of prudence on the part of the experimenter) and a principle 
of scientific utility (scientific competence of the experimenter, the need for the 
experiment to benefit society in a way that could not be obtained by other means, 
evaluation of the risk-benefit balance). This text inspired all the ethical regulations 
on medical research established after the war in the different countries. 


9.3.3, Randomized Controlled Trials 


It is customary to say that the methodological paradigm (gold standard) of contem- 
porary clinical trials was established in 1948, by a trial carried out by the Medical 
Research Council in 1947—1948 under the direction of the British statistician Austin 
Bradford Hill (1897-1991), a pupil of Fisher (Medical Research Council 1948). This 
trial, which involved about 100 patients, sought to evaluate the effectiveness of an 
antibiotic, streptomycin, in the treatment of tuberculosis. Compared to previous 
trials, this trial was unique in that it combined the following methodological char- 
acteristics, which henceforth defined randomized controlled trials (RCTs): 


Tn 1898, Albert Neisser (1855-1916) conducted a clinical trial of a vaccine against syphilis, 
prepared from the blood of syphilitic patients, on prostitutes who were not informed about the 
vaccine and who, in some cases, contracted syphilis. The scandal caused by this affair led to debates 
in the Prussian parliament and to the promulgation of the directive of 1900. 
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— It was based on an experimental design piloted from the outset by statisticians, 
who determined the number of subjects required, the organization of the exper- 
iment, the criteria for inclusion in the trial, the criteria for evaluating the treat- 
ment, and finally the expected results so that the treatment demonstrated, with a 
quantified risk of error, its effectiveness. 

— The trial included an experimental group (receiving streptomycin) and a control 
group (treated in the same way as tuberculosis was treated at the time: by bed 
rest). The allocation of patients to the two groups was randomized. Randomiza- 
tion ensured that there was no systematic difference between the two groups, 
other than treatment; the inclusion of each subject in one group or the other was 
independent of all other characteristics of the subject (including the severity of his 
or her illness). 

— Treatment assessment was blinded, based on anonymized radiological images, by 
physicians who had not followed the patients and who did not know what 
treatment the subject had received. 

— Finally, the trial was multicentred, i.e. the study was carried out simultaneously in 
several health care centers, which allowed for more general conclusions by 
limiting geographic or economic selection bias. 


Under these conditions, the Medical Research Council trial of 1947—1948 succeeded 
both in proving the efficacy of streptomycin in the treatment of tuberculosis (there 
were 4 deaths out of 55 patients in the experimental group compared with 14/52 in 
the control group, and the difference was judged to be statistically significant'®), and 
in demonstrating the efficacy of the randomized controlled trial for evaluating 
treatments. In contrast, the end of the nineteenth century and the first decades of 
the twentieth century correspond to a period of methodological trial and error in the 
conduct of clinical trials. During this period, the principle of the control group had 
yet to be established, and the number of subjects was left to the discretion of the 
experimenter. Experimental results were generally evaluated by the so-called “his- 
torical control” method: for example, by comparing the mortality rate obtained with 
a new treatment with the mortality rate for the same disease in the same hospital the 
previous year — but it was known that diseases can vary greatly from one year to the 
next. When care was taken to set up a control group, patients were not randomly 
assigned to one group or the other: the control group often overlapped with 
pre-existing divisions (e.g., another department in the same hospital), which intro- 
duced confounding factors; or the patients were assigned by the method of alterna- 
tion: the first subject in the experimental group, the second in the control group, etc. 
(this may have been the method that Louis used in his bloodletting experiments). But 


‘©The number of patients needed to conduct a trial depends on the difference between the 
percentage of cures in the experimental group and the control group. “If the reference treatment 
to which the drug candidate is compared cures 50% of patients, 200 patients will be needed to 
demonstrate a 20% improvement (70% cured patients). But if the drug candidate improves out- 
comes by only 5%, which is much more common today, it will take 3500 patients” (Pignarre 2003, 
p. 23). 
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even this method was not free of bias: knowing which group the next patient would 
be assigned to may, for example, have led a physician to exclude a patient from the 
experiment, if the physician knew that the patient was seriously ill and would receive 
a placebo (Fagot-Largeault 1992). Finally, although some “reforming” physicians 
tried to overcome the subjective biases of the individual clinician and test treatments 
on a large number of patients and in different geographical areas to remedy the 
effects of chance, in practice these trials often came up against the difficulty of 
standardizing the classification of patients, the dose of the administered treatment, 
and the assessment of therapeutic outcomes in different care centers. As a result, a 
number of trials carried out in the 1930s and during the war failed to produce 
conclusive results because the tests were carried out on too few patients, because 
the doctors following the patients did not respect the doses stipulated in the protocol, 
biased the allocation of patients or tried several remedies at the same time, or even 
refrained (often for ethical or political reasons) from setting up a control group 
(Marks 1999; Léwy 1998; Lowy 1999). Two standards of objectivity — that of 
individual expertise and that of the impersonal methodological criterion of proof — 
competed in the clinical trials of this period. In contrast, RCTs are based on blinding 
physicians and distrusting their judgement and impartiality: trust is transferred to the 
method and no longer rests on the individual experts (Marks 1999). 

In this respect, it is necessary to distinguish randomization from double blinding 
since these two principles are often combined in RCTs (but not in the original 
1947-1948 trial). Randomization remedies the effects of chance, whereas double- 
blinding remedies suggestion bias. Randomization was pioneered by Fisher, who 
showed that it was the best way to control for unnoticed confounders — in this case, 
systematic differences between groups due to causes other than treatment. In the 
Medical Research Council trial of 1947-48, randomization (using randomly gener- 
ated numbers and a system of sealed envelopes) was deemed crucial by Bradford 
Hill because it removed the possibility of intervention and the responsibility of 
clinicians in assigning patients to one arm of the trial or another: in his view, 
randomization was not so much a cure for objective bias (spontaneous remission 
of disease in some patients) as it was a cure for subjective bias, i.e., bias introduced 
by clinicians’ intervention in group formation (Amann 2012). In contrast, single- 
blinding (if patients do not know which group they are assigned to) or double- 
blinding (if patients and physicians do not know which group each patient is assigned 
to), and the placebo principle seek to combat the influence of imagination 
(of patients and physicians) on the results. The placebo test was invented in the 
eighteenth century to highlight the role played by suggestion in certain controversial 
treatments (first mesmerism, then homeopathy) (Kaptchuk 1998). In the original 
RCT of 1947-1948, the subjects in the control group did not receive a placebo: 
Bradford Hill considered that the placebo, better from a methodological point of 
view, would have been problematic from an ethical point of view, because it 
constituted a form of lying to the patient. Therefore, the trial was not double-blind, 
since the patients in the experimental group, the physicians and the nursing staff 
knew which patients were receiving streptomycin. The patients in the control group, 
on the other hand, did not know which group they were assigned to — in fact, they did 
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not even know they were participating in a clinical trial. They were not, however, 
placed in the same conditions as those in the experimental group since only the 
patients in the experimental group received the (antibiotic) shots (Keel 2011). 
Single- or double-blinding and placebo improve the quality of evidence in an RCT 
but are conceptually distinct from randomization. RCTs have rapidly become the 
standard in the evaluation of therapies, in an environment marked by the prolifera- 
tion of new drugs and the need for pre-marketing control of pharmaceuticals. In the 
United States, the thalidomide scandal!’ in 1962 led to the Kefauver Amendment 
which gave the American Food and Drug Administration the authority to require 
manufacturers to evaluate the toxicity and efficacy of drugs by means of randomized 
controlled trials before they were placed on the market.'* The obligation to define a 
methodology for sorting out effective drugs was a response to pressure from the 
pharmaceutical industry, health system managers and the public, the user of health 
care. Today, RCTs sit at the top of the hierarchy of evidence defined by EBM 
(Evidence Based Medicine) — a movement born in the 1990s that aims to promote the 
critical evaluation of the evidence on which medical practice is based and to 
encourage a more scientific clinical medicine. The second level of evidence is 
observational studies in epidemiology (which, like RCTs, provide evidence of a 
statistical nature, involving the observation of large numbers of patients and the use 
of techniques derived from mathematical statistics). The third level is pathophysiol- 
ogy, i.e. laboratory physiology or biology. The fourth is expert opinion.'? EBM thus 
consecrates the victory of the “empiricals” over the “rationals”: in a way, we are 
witnessing the posthumous revenge of the numerists (Louis) over the laboratory 
biologists (Claude Bernard). RCT methodology, however, far removed from the 
ordinary clinical relationship, has been difficult to accept by physicians and patients, 
and has given rise to numerous ethical and methodological questions. 

From an ethical point of view, the objections relate to the very principle of the 
control group, the possible administration of a placebo, or the impossibility of 
individually adapting the dosage in a clinical trial. The first argument is that if a 
drug is presumed to be effective, it is not ethical to deprive the control-group patient 
of it. Bradford Hill himself recognized the significance of this objection: in the 
1947-48 trial, only the scarcity of streptomycin in England in the immediate post- 
war period had overcome this objection (the shortage prevented all patients from 
being treated with streptomycin and thus justified the principle of reserving the 
treatment for the experimental group). To this argument, proponents of the clinical 
trial respond that giving the treatment is no more ethical than not giving it as long as 


'7 Thalidomide is a drug that had not been sufficiently tested on animals and that caused an epidemic 
of phocomelia (limb atrophy at birth) in the children of mothers who had used this drug between 
1957 and 1961. Ten thousand children with this condition were born world-wide. 

'8Prior to this date, a list of ingredients on pharmaceutical product inserts had been required in the 
United States since 1906, and, since 1938 (Federal Food, Drug and Cosmetic Act), the control of 
drug toxicity. 

'°On this hierarchy and the philosophical controversies it raises, see the excellent summaries by 
Elodie Giroux (2011) and Maél Lemoine (2017). 
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its efficacy has not been proven — and that we are in a situation of “clinical 
equipoise” (i.e., uncertainty about the relative value of two treatments): in some 
cases, it is the new treatment that turns out to be harmful or less effective than the 
placebo or reference treatment (Ferry-Danini 2020).”° Another issue is whether or 
not it is ethical to administer a placebo to the control group, even when subjects are 
informed that their participation in the experiment exposes them to receiving a 
placebo. The Tokyo (1975) revision of the World Medical Association’s Helsinki 
Declaration incorporated this argument by requiring that therapeutic trials always be 
conducted against the best available treatment, not against a placebo, even though it 
may be methodologically feasible (or less demanding for pharmaceutical companies) 
to test against a placebo. Finally, the impossibility of individual dosing is the price to 
pay for successful trials. The general argument is that it is more ethical to test 
treatments by rigorous and conclusive procedures than to administer them without 
testing. In other words, if no way of practising medicine is completely satisfactory 
from an ethical point of view, proponents of clinical trials defend the idea that they at 
least avoid unnecessary risks (Hill 1962). 

On the other hand, it can be considered that the establishment of RCTs, involving 
the coordination of many actors, has on the whole favoured publicity and therefore 
greater ethical transparency of therapeutic trials, if we compare RTCs to the “wild” 
trials that prevailed before the war (Fagot-Largeault 1985, 2000; Amiel 2011). In the 
United States, the Kefauver Amendment, which established the place of randomized 
controlled trials in the evaluation of treatments, was also the first law to include the 
requirement of consent in American legislation — although this requirement, as in the 
Helsinki Declaration (1964) promulgated by the World Medical Association, was 
still accompanied by limitations and exceptions that left considerable room for 
manoeuvre for the physician. The regulation of therapeutic trials and the protection 
of test subjects became stricter with the Tokyo revision of the Helsinki Declaration 
(1975): this text established the unconditional nature of consent, required that the 
interests of the individual patient always take precedence over those of science, 
imposed the mandatory prior examination of research projects by an independent 
ethics committee, and recommended that scientific journals no longer publish 
research that did not comply with these instructions. A few years later, the Belmont 
Report (1979) spelled out the main principles of the ethics of experimentation on 
human beings, which were likely to come into conflict with each other: the principle 
of respect for persons, based on the conviction that individuals must be treated as 
autonomous agents (free, informed, express consent), or must be protected when 
their autonomy is reduced; the principle of beneficence, which reaffirms primum 
non nocere and the necessary balance of risks and benefits; the principle of justice, 
which requires an equitable distribution of the burden and opportunities of 


0Despite this, “compassionate use” of certain non-validated treatments have sometimes been 
authorized in cases of lethal diseases and where patients’ associations have demanded a “right to 
trial” (notably during the AIDS epidemic in the 1980s—1990s), i.e., an equal right of access for 
patients to molecules that may be efficacious. 
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experimentation, for example to avoid that disadvantaged populations bear the risks 
of experiments whose benefits will essentially go to the most privileged groups, or 
that certain categories of people are excluded from the benefits of research because 
their diseases are not studied. More recently, the second revision of the Helsinki 
Declaration (Edinburgh 2000) seeks to address the “therapeutic misunderstanding” 
whereby a research procedure may be misinterpreted by the subject as an act of care. 
It also defends the idea that patients who have participated in a study be assured of 
being able to benefit, in the long term, from the therapeutic means that the research 
has shown to be superior. The evolution of regulations is moving towards an ever 
more explicit distinction between care and research (whether the latter involves 
healthy or sick subjects) — which do not have the same aims, are not subject to the 
same norms (medical ethics for the former, ethics committees for the latter), and tend 
to assume more frankly the idea that there may be dilemmas or compromises 
between divergent interests. These regulations and principles do not solve all the 
ethical problems that clinical trials continue to pose: the relocation of clinical trials to 
poor countries; the limits of the notion of free and informed consent when trials 
recruit subjects primarily, in return for payment, from economically dependent 
populations with little access to care; the evaluation of treatments conducted by 
the pharmaceutical industry itself and biased by conflicts of interest; serious acci- 
dents on a large scale; the pharmaceutical industry focusing on the manufacture of 
“me-too drugs” rather than on useful treatments, etc. 

Finally, from a methodological point of view, the external validity of clinical 
trials has been questioned, i.e. the relevance of their results for the general popula- 
tion. Indeed, a clinical trial often involves selective inclusion criteria which 
exclude, for example, patients who are too old, patients with atypical forms of the 
disease studied, or patients suffering simultaneously from several diseases, or 
receiving several treatments. While all these precautions, which serve to avoid 
confounding factors, are justified in terms of providing evidence of efficacy, they 
are problematic when it comes to applying a treatment to the general population 
(where, for example, patients are often elderly, suffer from multiple pathologies, take 
several medications at the same time, etc.).7! The value of RCTs can therefore be 
questioned on the grounds of their artificiality, in favour of observational studies 
where the confounding factors are less controlled but are more representative. More 
generally, objections have been raised concerning the idea that RCTs constitute the 
highest level of evidence, superior in principle to observational (epidemiological) or 
pathophysiological (laboratory experimentation) evidence.** On the one hand, RCTs 
cannot supplant other types of evidence. RCTs are in fact difficult to carry out for 
certain pathologies (the principles of the control group, double-blind treatment or 
placebo are often difficult to apply in surgery or psychiatry), useless in certain cases 


?!For a discussion of these difficulties, see Lemoine (2017, pp. 105-108), who provides further 
references. 

For a synthetic introduction to this debate, see Giroux (2011, pp. 424-428) and Lemoine (2017, 
pp. 97-115). 
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(Bradford Hill himself said that there was no need for RCTs for diseases that were 
100% lethal) (Hill 1962), and above all they are often impossible to carry out for 
ethical reasons: one cannot deliberately expose patients to a harmful factor 
(e.g. smoking) in order to prove its harmfulness, hence the need for observational 
epidemiological surveys (“case-control survey”, “prospective cohort surveys”) — 
which operate on the model of quasi-experiments — to study these causal links. 
Furthermore, we can question the absolute partition that the EBM hierarchy insti- 
tutes between experimental studies (clinical trials) and observational studies (epide- 
miology of risk factors), assuming (classically) that only the former allow causality 
to be established: it is accepted today that the judgment of causality in disease 
etiology is based on a bundle of converging arguments (notably Bradford Hill’s 
criteria, who himself did not admit this radial cut-off between RCTs and epidemi- 
ology), some of which are experimental in nature and others of an observational 
nature (Giroux 2011, pp. 425-35, Lemoine 2017, pp. 75—95). Finally, and above all, 
RCTs do not make it possible to understand the mechanism of action of the 
treatments whose efficacy they demonstrate — here we again encounter the criticism 
that Claude Bernard addressed to the numerical method: this explanatory dimension 
is essential to medical knowledge (Lemoine 2017, p. 117-131). 


9.4 Conclusion 


The notion of medical experimentation falls under different concepts and practices, 
which have evolved over the long history of medicine. If experimentation is defined 
by uncertainty and the trial and error consubstantial with clinical practice, then it has 
belonged to medicine since its origins. Experimentation as the testing of new 
treatments on sick or healthy subjects has been the object of a more explicit staging, 
in a theater of proof, since the modern period. Finally, experimentation in the sense 
of a methodical comparative trial is intimately linked to the development, since the 
nineteenth century, of clinical statistics on the one hand, and of the experimental 
method on the other. All these “layers” of the notion of experimentation are still 
present to some degree in contemporary clinical trials, which are multi-phase 
processes combining different types of evidence and experimental approaches 
(Pignarre 2003, p. 22). 

Clinical trials are often preceded by a phase of so-called “pre-clinical” studies, 
carried out in the laboratory, where the future treatment is developed and tested 
in vitro and in vivo on animals. This first phase evokes the “rational” approach of the 
Bernardian (or Pasteurian) tradition, in which remedies are deduced from an under- 
standing of biological mechanisms and the causes of pathologies. In RCTs, 
researchers prioritize the testing of molecules that, after laboratory study, are thought 
to act on the explanatory mechanisms of the diseases concerned. It must be recog- 
nized, however, that this “rational” approach is not the only one used: pharmaco- 
logical innovation also takes a more empirical route, that of screening, which 
consists of inventing “in reverse”: first, a molecule is available and then, without 
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any preconceived idea, we look to see if this molecule might be effective in this or 
that pathology, using animal models (Pignarre 2003, p. 41). In any case, this 
preliminary study phase is “experimental” in another sense: experimentation here 
refers to laboratory experimentation on animals. Only after this experimentation 
does the actual clinical trial begin on human subjects. This trial has four phases. 
Phase I, in which drug candidates are tested on 20 to 100 healthy volunteers, is used 
to observe the effects of the treatment on the human body, to verify its safety, and to 
determine the dosage at which it is toxic. Phase II, which involves 100 to 1000 
patients, seeks to determine whether the treatment is effective for the disease in 
question and to determine the optimal dose. Here we find our second meaning of 
medical experimentation (as a therapeutic trial). Phase II, which includes between 
1000 and 5000 patients, is the comparative trial (comparison against a reference 
treatment or against a placebo) which provides statistical proof of efficacy and 
tolerance — this is the RCT proper, in line with the numerical method. If this phase 
is conclusive, a marketing authorization (MA) is then granted. Then phase IV, 
known as pharmacovigilance, begins, involving more than 10,000 patients, and is 
used to detect rarer or longer-term adverse effects. Clinical trials are thus a long-term 
process (about 10 years before market approval), during which several forms of 
experimentation follow one another and are combined: laboratory experiments, 
“daring trials”, clinical trials on large numbers of patients based on statistical 
evidence, and finally the more traditional “clinical experiment’, during the large- 
scale distribution of the treatment. 
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Chapter 10 
The Medical Clinic as an Experimental orssh 
Practice 


Jean-Christophe Weber 


Clinical medicine is often construed as an art which does not have the nobility and 
consistency of scientific knowledge. As represented by physicians and laymen, 
medical science refers to the use of biotechnologies, clinical trials, and laboratories, 
while the clinic would lie far from this. Yet Aristotle had already made experience an 
indispensable prerequisite for tekhné, Claude Bernard the fruit of repeated observa- 
tions, and Canguilhem demonstrated the proximity of therapy and experimentation. 
Can we today give more content to the proposition that clinical medicine is an 
experimental practice? We will argue that the clinic is the specific laboratory of 
medicine. First, we will rely on Claude Bernard to show, contra his explicit thesis, 
that the medical clinic is part of the experimental method (Sect. 10.1.1), which 
includes observation and experimentation (Sect. 10.1.2). We will then examine 
whether clinical practice, analogous to a laboratory experiment, satisfies the condi- 
tions of experimentation (Sect. 10.2), before considering the difficulties and biases it 
faces (Sects. 10.3, 10.4, and 10.5). 


10.1 Experimental Medicine 


For Claude Bernard, experimental medicine was the medicine that finally emerged 
from a state of backwardness. While preserving some of the achievements of the 
past, experimental medicine detached itself from Hippocratic expectant medicine 
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and from empirical activist medicine.’ The qualifier “experimental” introduced a 
partition in the genre, between two or more species of medicine. The ideal type was 
an experimental medicine, able to take advantage of the physical-chemical sciences, 
and which the new physiological method, even more adequate for the study of living 
beings, would lead the new medicine on its definitive scientific path. The gradual 
adoption of the “method of investigation common to the experimental sciences” 
(Bernard 1966, p. 25) would enable medicine to gradually abandon systems and 
doctrines, which served fallaciously as general premises for deductions without 
verification. Bernard opposed the systems of theories produced by induction, always 
provisional and revisable by the addition of new facts, the only axiomatic postulate 
not discussed being the determinism of the phenomena observed or produced, to 
which the experimental method endeavoured to attach future causes. First causes, 
which belonged to metaphysics, remained outside the domain of Bernardian science. 

The royal road to science was thus the experimental road, which was the 
“immediate and rigorous application of reasoning to the facts that observation and 
experimentation provide us with” (ibid., p. 26). One can read in Bernard a dialectic 
between ideas, facts, and reasoning. The initial observation may be fortuitous and 
passive, but then everything is set in motion by the operation of reasoning which 
seeks to produce new facts by undertaking an experiment, facts that a theory 
proposes to construct at the same time as it exposes itself to them for eventual 
revision. Rather than opposing passive observation and active experimentation, 
Bernard emphasized that the experimental method includes rigorous observations 
of facts, whether they are spontaneous or obtained by experimentation, i.e. in 
conditions that the experimenter has created and determined. Observations and 
experiments are associated in the process of investigation, itself inextricably linked 
to the intellectual process that Bernard calls “experimental reasoning”. But he also 
uses this term as a quasi-synonym for the experimental method as a whole. And 
while experimentation generally consists of the voluntary introduction of a variation 
or disorder, these can also be provided spontaneously by disease, without interven- 
tion by the experimenter. 

Pathological lesions [which] are true experiments from which the physician and the phys- 

iologist profit, without there being any premeditation on their part to provoke these lesions 


which are the result of the disease. [...] Medicine has true experiments, although these are 
spontaneous and not provoked by the physician (ibid., p. 38). 


The intentionality that matters is that of judgment, of comparison, of control, more 
than that of introducing the disorder. We begin to understand how the clinic is 


'“The Hippocratist, who believes in the medicating nature and little in the curative action of 
remedies, quietly follows the course of the disease; he remains in expectation, limiting himself to 
favouring the happy tendencies of nature by a few simple medications. The empiricist, who believes 
in the action of remedies as a means to change the direction of diseases and to cure them, is content 
to observe empirically the actions of medicines without trying to understand scientifically the 
mechanism. He is never in trouble; when a remedy has failed, he tries another; he always has 
recipes or formulas at his service for all cases” (Bernard 1966, p. 292). 
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already potentially a space of experimentation that results from the variations that 
illness provides. 


10.1.1 The Clinic as an Experimental Practice 


What arguments can be found in Claude Bernard’s work to support the 
proposition that: the clinic is an experimental practice? When we speak of the 
physician’s experience, or of a physician who has experience, “it means the infor- 
mation which he has gained in the practice of medicine”. [...] Subsequently the 
word experience in the concrete was extended to cover the facts which give us 
experimental information about things (ibid., pp. 39-40). 


When one speaks in a concrete way, and when one says one is doing experiments or making 
observations, this means that one engages in investigation and research, that one attempts 
trials, tests, with the aim of acquiring facts from which the mind, with the help of reasoning, 
will be able to draw knowledge or instruction (ibid., p. 40). 


To practice medicine is therefore to undertake experiments, and to acquire experi- 
ence is to learn the facts and to let oneself be taught by the facts. To make 
observations is therefore to engage in research. Medical observation can be fully 
included in this framework because observation means first of all “the exact obser- 
vation of a fact with the help of appropriate means of investigation and study”, and 
by extension the facts noted (ibid., p. 40). Observation plays a supporting role for the 
reasoning that prepares the experiment: 


When we say to rely on observation and to acquire experience, this means that observation 
is the fulcrum of the reasoning mind, and experience the fulcrum of the concluding mind, or 
better still, the fruit of right reasoning applied to the interpretation of facts (ibid., emphasis 
added). 


The experience gained is the result of this observation-reasoning-experience 
dynamic: 
Observation is therefore what shows the facts; experience is what teaches us about the facts 
and what gives experience in relation to a thing. But since this understanding can only come 


about through comparison and judgment, that is, because of reasoning, it follows that man 
alone can acquire experience and perfect himself through it (ibid.). 


All of this can be directly applied to the medical clinic: one engages in investigation 
by means of anamnesis and physical examination, with the aim of acquiring and 
producing (e.g. through the artifice use of the stethoscope) facts from which the 
mind, with the aid of clinical reasoning, can derive knowledge (the diagnosis). The 
facts of observation are followed by an experiment that instructs them (for example, 
the testing of a hypothesis by a complementary examination). Multiplying observa- 
tions and experiments leads to the acquisition of experience, and this experience 
makes one more capable of experimenting: 
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An old physician who has often administered treatments and who has treated many patients 
will be more experienced, that is to say, will experiment better on his new patients because 
he has learned from the experiments he has made on others. The surgeon who has often 
performed operations in various cases will learn and improve experimentally. Thus, as we 
can see, instruction only comes through experience (ibid., p. 152). 


What then distinguishes the clinic from the laboratory of experimental medicine? 


There is first of all a kind of unconscious and empirical instruction or experience, which one 
obtains by the practice of each thing. But the knowledge that one acquires in this way is 
nevertheless necessarily accompanied by a vague experimental reasoning that one makes 
without realizing it, and as a result of which one brings the facts together in order to make a 
judgment about them. Experience can thus be acquired by an empirical and unconscious 
reasoning; but this obscure and spontaneous walk of the mind has been erected by the 
scientist into a clear and reasoned method, which then proceeds more rapidly and in a 
conscious way towards a determined goal (ibid., p. 41). 


In other words, the simple repetition of a practice is already a source of empirical 
knowledge, and even a (quasi) experimental practice, even though it is obscure, 
spontaneous, and unconscious. Prior to the Bernardian method, to experiment is, “to 
wait for a teaching of the gestures of which one has taken the initiative” (Canguilhem 
1994, pp. 383-391). If we do clinical medicine rigorously, do we not then move 
from the shade to the light? 


10.1.2. Observation and/or Experimentation? 


But is the clinical investigator an observer or an experimenter? 


The name of observer is given to the one who applies simple or complex investigation 
procedures to the study of phenomena that he does not vary and that he collects, conse- 
quently, such as nature offers them. The name of experimenter is given to the one who uses 
simple or complex investigation processes to vary or modify, for any purpose, the natural 
phenomena and to make them appear in circumstances or in conditions in which nature did 
not present them to him. In this sense, observation is the investigation of a natural phenom- 
enon, and experience is the investigation of a phenomenon modified by the investigator 
(Bernard 1966, pp. 44-45, emphasis added). 


In any case, what counts for the observer is to go beyond simple observation and to 
reason, without which there would be no observational science. Outside of an 
experimental protocol, or a therapeutic trial, we might spontaneously see the clini- 
cian as an observer rather than an experimenter. But this would be to forget that the 
institutions of medical consultation or hospitalization are not natural conditions but 
artificial. Michel Foucault (1963) and others have emphasized this aspect of the 
clinic.” Moreover, to prescribe a drug is to voluntarily introduce a variation into 
natural phenomena and to study the resulting modifications. 


The parallel with the art of anthropological inquiry is striking. “In the art of inquiry [...] every 
implementation is an experiment [...] in the sense of a scout blazing a trail and continuing on his way 
to see where it leads him. To experiment is to try certain things and observe what happens” (Ingold 
2017, p. 32). 


10 The Medical Clinic as an Experimental Practice 125 


Even if we seek to cast the clinician as an observer and the physiologist as an 
experimenter, both “are investigators who seek to ascertain the facts as best they can 
and who employ for this purpose more or less complicated means of study, 
according to the complexity of the phenomena they are studying. They may both 
need the same manual and intellectual activity, the same skill, the same spirit of 
invention, to create and perfect the various devices or instruments of investigation 
which are common to most of them. Each science has, as it were, its own kind of 
investigation and its own set of instruments and special procedures” (Bernard 1966, 
p. 42). 

Let us draw some consequences from these Bernardian reminders. The clinical 
physician can be an experimental scientist worthy of Claude Bernard’s demands if 
he takes care of his investigative instruments to guard against errors of observation. 
Concern over instrument quality is not just a matter of assuring that a blood pressure 
monitor is well calibrated. It means being concerned about one’s readiness to be a 
good observer: to train oneself through multiple observations which are then sifted 
through experimentation, but also to train oneself to listen to the patient, who 
constitutes a kind of instrument and who introduces variation into natural phenom- 
ena. One may hesitate to consider oneself as an instrument.* One can take advantage 
of the details of the procedures necessary to arrive at “true science” only by going to 
the coalface oneself. As Bernard says, one must go through “a long and ghastly 
kitchen” in order to reach the “superb and dazzlingly lighted hall” (ibid., p. 44). 

One can only really access knowledge by going through the experience oneself. 
The clinic is an obligatory passage. 

Being in an experimental frame of mind is an unusual way of characterizing 
clinical activity. To deal with a patient, however, is to seek to solve an enigma, to 
problematize a difficulty. Bernard portrayed the scientist as Janus-faced: on the 
observer’s side, he had to be content with “purely and simply observing the 
phenomenon before his eyes” and “being the photographer of phenomena”. “We 
must observe without any preconceived idea; the observer’s mind must be passive, 
that is, must hold its peace; it listens to nature and writes at nature’s dictation ». The 
other side is that of the experimenter who has preconceived ideas with which he 
“must question nature and put all manner of queries to it”. His observation is 
“provoked or premeditated” (ibid., p. 52). One understands then that these two 
sides alternate dialectically, for the result of the provoked or premeditated observa- 
tion must in turn be correctly observed, before being subjected to control. This 
alternation is both active and passive: to question nature, to force it to reveal itself by 
all sorts of questions, but to remain silent when it speaks and to observe its answers, 
to listen to it to the end, to not answer in its place “nor to listen partially to its answers 


Freud used this metaphor in his advice to physicians: the analyst must “turn to the patient’s 
transmitting unconscious his own unconscious as a receiving organ. He must adjust himself to the 
patient just as a telephone receiver is adjusted to the transmitting microphone” (Freud 1998, 
pp. 143-154). 
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by taking, from the results of an experiment, only those which support or confirm his 
hypothesis” (ibid., p. 53). 

Questioning nature, Bernard repeats. Replace “nature” with “the patient”, and we 
have a discourse on clinical method! One must always be ready to revise 
preconceived ideas, assumptions and a priori. For the clinician, to observe without 
prejudice is a difficult disposition to affect, it is almost an asceticism. For one is also 
an experimenter, one observes with one’s knowledge, experience, and prejudices. At 
the same time, one must be prepared to observe or hear an unexpected phenomenon 
that arouses curiosity. Here is a piece of advice of Bernardian prudence: keep silent, 
listen to the patient, and write according to their dictates. This is how to get as close 
as possible to the position of the observer, a necessary condition to produce well- 
observed facts. The more the clinician adopts the dispositions and methods of the 
Bernardian experimenter, the better the clinician he becomes! 

It therefore seems legitimate to consider clinical practice as an experiment in the 
Bernardian sense of the term, which is apparently contrary to Bernard’s own thesis. 
In the remainder of this chapter, we will consider several points that merit 
discussion. 


10.2. Typology and Criteria of Experimentation 


Among the different types of experimentation, the clinic is more closely related to 
the laboratory than to indirect or comparative experimentation or even field exper- 
imentation. The clinic is the laboratory of medicine, in the double sense of the 
genitive: it is the place where medicine’s hypotheses are verified (or not), but also the 
place where medicine is designed. It is not, of course, a question of constituting a 
sample, and the patient is generally his own control: the conditions for comparison 
are not satisfied. The iteration of individual cases serves experience, even when it 
leads to questioning what we thought we knew. It sometimes happens that succes- 
sive cases of the same disease are collected in series: in this case, we leave clinical 
medicine for more systematic research, which distances us somewhat from our topic. 

In the narrower sense, to experiment means to produce or modify a phenomenon 
in a voluntary, systematic, and controlled manner in order to isolate one or more 
parameters that contribute to produce the phenomenon under study. We have already 
mentioned that in the diagnostic stage the phenomenon is the spontaneous result of 
the disease. 


When one sees a phenomenon that one is not used to seeing, one must always ask oneself to 
what it may be due, or in other words, what the proximate cause is; then an answer or an idea 
presents itself to the mind that must be submitted to experience (Bernard 1966, p. 217). 


On the other hand, the introduction of a treatment corresponds to the voluntary 
modification of a phenomenon: “Any physician who gives active drugs to his 
patients is cooperating in the construction of experimental medicine” (ibid., p. 183). 
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Let us see to what extent the three conditions of experimentation (isolate, 
manipulate, reproduce) are present in the clinic. 

It is a question of isolating, within the complexity of physical phenomena, “the 
one phenomenon on which our studies are brought to bear” (ibid., p. 183), because 
“the object of analysis, in biological as in physical-chemical sciences, is after all, to 
determine and, as far as possible, to isolate the conditions governing the occurrence 
of each phenomenon” (ibid., p. 116). In the clinic, we never reach this degree of 
isolation. Certainly, the analysis of a complex situation is in some ways analogous to 
the isolation of determining variables which, if not independent, are at least separa- 
ble*: the methodical analytical decomposition of the case is necessary for the 
relevant choice of tests to be carried out to confirm/infirm the hypotheses and for 
an adequate therapeutic intervention. It is true that disciplinary specialization and the 
use of biotechnologies contribute to refining the isolation of phenomena, but the 
“object” of the experimental clinic remains the patient, of whom each specialist 
figures out only a truncated image. Moreover, the clinician never ceases to vary the 
focus of his observation, between the injured organ and the sick individual, who 
ultimately constitutes the operative unit. Let us note, in addition, that for Bernard, the 
individuality of the living organism makes necessary the physiological synthesis 
after the analysis of the phenomena. In medical practice, analysis and synthesis are 
the responsibility of the clinician. Therefore, the criterion of isolation of the phe- 
nomenon to be studied cannot be completely met, even if the clinician tries to isolate 
what will be the object of his diagnostic investigations and his therapeutic trials. 
Most often, the degree of precision does not reach the level of physicochemical 
determinism. 

Isolating variables is the prerequisite for their independent manipulation. Because 
of the above, the principle of manipulating each variable independently cannot be 
undertaken. In clinical practice, the ability to act on a single variable is never 
achieved. Isolating and manipulating allow the phenomenon to be reproduced “at 
the will of the experimenter” (ibid., p. 109) once its condition is known. If we think 
we have discovered the condition of a phenomenon (for example, the bacterium 
responsible for an infectious state), we must refrain from reproducing experimentally 
the conditions which would provoke a new episode of illness.” 

To these criterions of experimentation must be added that the confirmation of the 
idea by experiment is insufficient: the experimenters “should still doubt and require a 
counterproof” (ibid., p. 91), which alone is able to decide “whether the relation of 
cause to effect, which we seek in phenomena, has been found. To do this, the 
admitted cause is removed to see if the effect persists, relying on that old and 


“For example, it could be hypothesized that a patient’s condition is the result of a bacterial renal 
infection favoured by bladder retention due to a prostate adenoma, associated with decompensation 
of a pre-existing cardiac disease and dehydration, three parameters that cause confusion and renal 
impairment. 

>Some situations are spontaneous reproductions (e.g. the involuntary reintroduction of a drug with 
side effects) but they cannot be deliberate, because of the risk involved. 
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absolutely true adage: Sublata causa, tollitur effectus.° This is what is still called the 
experimentum crucis” (ibid.). 

This brings us closer to the clinic instead: a large part of ordinary therapy is 
designed as a counterproof, which, by acting on the causes, aims to make the 
consequences disappear. And in situations where the cause is not yet certain, the 
patient is offered a so-called “test treatment’. The effectiveness of this treatment 
(i.e. the significant improvement in health status) then provides the confirmation that 
one had not been able to obtain otherwise, and conversely, the failure of the test 
treatment (for example: the introduction of an anti-tuberculosis therapy) makes it 
possible to rule out one of the causes considered for the observed phenomena. 
However, apart from treatments that “target” the cause and whose outcome validates 
or invalidates the hypothesis, the most well-established knowledge is usually only a 
probability with a confidence interval, which is quite far from the proof provided by 
an experimentum crucis. The clinic therefore almost never meets the conditions of 
purity of the scientific method. Whereas science “ought not to encumber itself with 
apparent facts collected without precision” and “rejects the indeterminate” (ibid., 
p. 90), the clinic cannot discard such facts on these grounds. The duties of the 
caregiving physician are not the same as those of the scientist. The responsibility is 
not the same. As we can see, the clinic conceived as an experimentation faces a great 
number of closely intertwined difficulties and biases. 


10.3. Methodological Difficulties 


The complexity of the situation and the multiple interactions of all the dimensions 
that constitute the clinic are an obstacle to the independent analysis of perfectly 
isolated conditions. The singularity of the case makes it impossible to replicate an 
experiment. These methodological difficulties should not, however, lead to the 
abandonment of the experimental approach in the clinic. “If I am not practising 
clinical medicine here, I must nevertheless take account of it and assign it the first 
place in experimental medicine” (Bernard 1966, pp. 281—282). 

Of course, the term “first place” is equivocal, and perhaps refers more to an initial 
stage than to a prominent place. But if the clinic is a research laboratory on a 
Lilliputian scale, diagnostic investigation nevertheless deserves its name: it is an 
investigation. And if “surely the physician meets the patient by chance” (ibid., 
p. 268), what I call “arranging the doable” (Weber 2017, p. 56) corresponds in 
every respect to the first step of any experiment, which “consists in premeditating 
and bringing to pass the conditions of the experiment” (Bernard 1966, p. 53). The 


Once the cause is removed, the effect disappears (a consequence of the thesis according to which 
there is no effect without a cause). 
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second step is to witness what happens during the consultation experience, to 
observe its results, i.e. the provoked observation.’ 

In their reductive aim, a guarantee of efficiency, scientists seek to free themselves 
from that which cannot be the object of experimentation, whereas the physician is 
inevitably confronted with it: 


The practice is exceedingly complex involving any number of social and extra-scientific 
questions. [...] The physician, in his treatment, often has to take account of the so-called 
influence of the moral over the physical, and also of any number of family and social 
considerations which have nothing to do with science. Therefore, an accomplished practic- 
ing physician should be not only learned in his science, but also upright and endowed with 
keenness, tact, and good sense (ibid., p. 329). 


10.4 Ethical and Political Problems 


The general principles of ethics in biomedical research, derived from the Nuremberg 
Code, then from the Belmont Report,® and taken up in numerous declarations and 
conventions (including that of Oviedo’), have contributed to the codification of 
medical research, but have also quite quickly found their specification in the clinic, 
which is another argument for drawing medical research and clinical medicine 
together. In the clinic, as in research, it is above all a question of establishing the 
benefit/risk balance (principles of beneficence and non-maleficence) and of respect- 
ing autonomy (information and consent). Although the physician can be seized by a 
frenzy of experimental abuse, adopting an experimental point of view for the clinic is 
not harmful to the patient, and experimentation is even a duty when it is in the 
interest of the patient: “It is our duty and our right to perform an experiment on man 
whenever it can save his life, cure him or gain him some personal advantage” (ibid., 
p. 151). 

In his “Therapeutics, Experimentation, Responsibility”, Canguilhem quoted this 
passage and pointed out that the humanistic ideal was even better served by adopting 


7“T such circumstances, the physician’s initiative consists in seeing the fact that chance presents to 
him and in not letting it escape, and his only merit is accurate observation” (ibid., p. 268). 


’The Belmont Report, “Ethical Principles and Guidelines for the Protection of Human Subjects of 
Research”, published in 1979 by the United States Department of Health, Education, and Welfare, 
takes its name from the Belmont Conference Center in Maryland, USA, where it was developed by 
a commission charged with establishing a normative framework for medical and behavioral 
research in the wake of numerous scandals, including the Tuskegee County syphilis study, where 
subjects suffering from syphilis enrolled in a research program continued to be simply observed 
long after penicillin was discovered. Available at hhs.gov/ohrp/regulations-and-policy/belmont- 
report/index.html] and in French at erasme.ulb.ac.be/en/ enseignement-recherche/comite-d-ethique/ 
consensus-ethiques/rapport-belmont. 

°The Convention on Human Rights and Biomedicine, drawn up in 1997 in Oviedo (Spain), is 
intended to be a universal reference for the protection of human beings and their genetic heritage in 
the context of biological and medical science. Signatory states are obliged to bring their legislation 
into line with the principles set out in this Convention. France signed in 2012. 


130 J.-C. Weber 


the experimental stance, contrary to the widespread view that man’s humanity had to 
be preserved against experimentation (Canguilhem 1994). From an ethical point of 
view, however, there is a major difference between the clinic and research. The 
scientist observes and modifies for the purpose of knowledge: 


When we say ‘making experiments or making observations’, we mean that we devote 
ourselves to investigation and to research, that we make attempts and trials in order to 
gain facts from which the mind, through reasoning, may draw knowledge or instruction 
(Bernard 1966, p. 40). 


The physician, on the other hand, observes to diagnose, modifies to treat. While his 
actions and reasoning are similar, the intention is different. The clinician experi- 
ments as he seeks to treat someone better. 


10.5 Bias of Clinical Experimentation 


The large number of biases that can affect the clinician might seem off-putting and 
could definitively ruin our hypothesis of a clinic as an experimental practice. As a 
human practice embedded in a linguistic relationship in which emotions, expecta- 
tions and misunderstandings operate, the clinic is vulnerable to all the classical 
biases that experimentation seeks to control. The patient is even more motivated 
and complacent because his health is at stake. Worse still, the practice relies to some 
extent on these biases for its effectiveness, for example by using persuasive rhetoric 
to increase the patient’s confidence in the power of the remedy. 

In science, personal authority is dethroned by the impersonal authority of the 
method: “The revolution which the experimental method has effected in the sciences 
is this: it has put a scientific criterion in the place of personal authority” (ibid., p. 74). 
“The experimental method draws from within itself an impersonal authority which 
dominates science” (ibid., p. 76). 

The impersonal is suitable for everything that derives from necessity, whereas the 
clinic is broader, it deals with the contingent and sometimes the fortuitous: the 
observer is no more replaceable than the observed. When the physician uses the 
impersonal authority of science, it is not always to efface himself, it can also be to 
increase his own persuasive power. Most importantly, the knowledge that the 
physician puts into practice (know-how, experience, inferential logic, and even his 
or her scientific knowledge) is eminently personal knowledge (Sturmberg and 
Topolski 2014). It also seems that in the medical clinic, the physician’s desire cannot 
be evacuated. And yet, we should hear the impersonal resonate: the clinician 
“personifies” (embodies) a singular complexion of impersonal determinations that 
run through him. Allow us to propose this formulation: tekhné operates almost 
despite the physician, even though the physician actualizes it in a singular way. 

Let us examine yet another bias in the light of the Bernardian canon. The scientist 
needs “complete freedom of mind” (Bernard 1966, p. 68), whereas the clinician 
works under the pressure of the patient’s request and the seriousness of the situation, 
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without forgetting the constraints of care organization, activity production and 
productivity. Even if the clinician does not control the situation, s/he must make 
decisions and act, without being able to suspend judgment for too long or to “refrain 
in ignorance” (ibid., p. 90). All these constraints draw medicine away from the ideal 
conditions of experimental work. Uncertainty is not only stimulating to experiment 
with, but it can also be distressing. In the more common cases, situations of 
uncertainty — illustrated by chronic diseases, polypathologies, psychosomatic 
entanglements — are more embarrassing than worrying, and easily lead to become 
a “systematiser” (ibid., p. 70), that is to say, a simplifier. 


10.6 Conclusion 


The problem that confronts the clinician is of great complexity, but this should not be 
a hindrance. On the contrary, it should be an incentive to investigate and experiment: 
“in any case we gain by experimenting.”, even if it is by groping, by going 
“according to a kind of intuition” guided by the probabilities that one will see. 
Even when the subject is “entirely dark and unexplored’, it is necessary to try to “fish 
in troubled waters, [...] to conduct experiments to see’. In these cases where one 
experiments by groping, one nevertheless intends to “induce an observation [.. .] 
with the object of bringing to birth an idea” (ibid., pp. 50-51, emphasis added). 

If medicine is experimental, then it must be admitted that its object is much wider 
than that delimited by Bernard. Whereas “experimental criticism must only deal with 
facts and never with words” (ibid., p. 256), medicine deals with individuals whose 
words cannot be erased. However, all the objections examined do not ruin the initial 
assumption: the clinic is also a field of investigation, and the experimental method is 
a model for the clinic. The latter, however, imposes its own conditions of impurity 
and complexity. 
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Chapter 11 ®) 
Experimentation in Archaeology oe 


Nicolas Monteix 


Before defining the forms of implementation, the contributions, the constraints, and 
the limits of experimentation in archaeology, it is important to specify what is 
covered by this field of study. Far from the tenacious image that reduces archeology 
to the mere accumulation of artefacts, like an Indiana Jones proclaiming that a cross, 
reputed to have been owned by Francisco Vasquez de Coronado, “[.. .] belongs in a 
museum” (Spielberg 1989), archaeology is a discipline understood here as a science 
of the past, based on the direct observation of material remains. The latter include, of 
course, artefacts but cannot be limited to them. 

The objects of archaeological research have multiplied over the last thirty years 
through a rapprochement with materials science (archaeometry), life science 
(archaeozoology and archaeobotany), environmental science and geography 
(archaeogeography). Whatever the consequences of this extension of the domains 
of archaeology, the postulate that underlies their value can be summarized in the idea 
that all or part of daily human activity, ordinary and extraordinary, is likely to leave a 
material imprint which, if it has been preserved, can be examined and interpreted in 
order to restore the action that produced it. 

The main — but not the only — mode of observation of material remains is the 
excavation of a site, which corresponds, in the words of Philippe Boissinot (2015, 
p. 29), to the dismantling of an aggregate defined as follows: “Archaeological sites 
[...] consist of accumulations of things that may have already had their unity for 
themselves and whose totality has not necessarily been thought of as such.” Prag- 
matically, it is a question of dismantling sedimentary accumulations formed by 
human actions and that contain, in addition to the sediment, material traces of 
these actions. The unitary value of the different accumulations — “layers”, called 
stratigraphic units — is recognized by the archaeologist according to intrinsic as well 
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as extrinsic criteria, which introduces a subjective bias in observation, even though 
the aggregate is a potentially incomplete, but objective whole. The non-iterative 
character of this dismantling cannot be overemphasized here: any damage to the 
accumulation is irreversible. Consequently, during the excavation, the archaeologist 
must note, “record”, both in written form and through drawing or photography, each 
of the elements that make it possible to explain why the accumulation has been 
considered as unitary. 

From these collected data, it is possible to switch, borrowing again from 
Boissinot’s terminology, to the phase in which the unitary accumulations are 
distributed over time, according to an interpretation that follows the laws of archae- 
ological stratigraphy stated by Edward C. Harris, the first of which, defined in 
geology as early as the seventeenth century, asserts that the lower layers are older 
than those above them. This ordering is initially relative, before being inserted into 
an absolute chronological framework based on the artifacts present in each of the 
units. It allows for the development of a narrative inscribed in time, which is 
indispensable for the subsequent development of interpretations, in that all the 
proposed restitutions of actions have a spatial and temporal context. 

Despite the attempts to make archaeology “more scientific” by resorting to 
hypothetico-deductive reasoning, deployed as a banner by the New Archaeology 
from the 1960s onwards (Binford 1962, 1965; Willey and Phillips 1958),' this form 
of interpretation is in fact completely foreign to the field of archaeology: unless one 
confines oneself to very broad levels of generality, the unique character of each 
aggregate, its destruction during observation and the impossibility of reconstituting 
the latter prohibit proceeding, in the strict sense, by hypotheses that are tested in 
order to be validated or not.* Without denying the scientific character of archaeol- 
ogy, the interpretive processes deployed are first and foremost induction: the induc- 
tive arguments, inferred by the generalization of the observed facts, gain strength by 
the multiplication of similar observations, without however ever being able to 
become certain, because of the inability to dismantle all the aggregates. Inductive 
reasoning is then powerfully based on formal analogies between observations made 
in distinct sites. Given the importance of reflections on causality, which seek to 
understand the functioning of past societies, the share of abductive reasoning in the 
interpretative processes deployed in archaeology is relatively high.* 


‘For a critical view of this type of reasoning, see Cleuziou et al. (1973), mainly pp. 41-44. 


On the other hand, it is possible to present archaeological interpretations as a questioning of a 
model (hypotheses), validated by the observed facts. This does not go beyond mere rhetoric. 


>The most obvious form of such abductive reasoning — among many — concerns catastrophic events 
(fires and earthquakes in particular), especially if they are mentioned and dated in the texts coming 
from the manuscript translation. For example, two texts, one by Tacitus (Annals, 15, 22), the other 
by Seneca (Natural Questions, 6, 1, 1-2), report that the city of Pompeii was badly hit by an 
earthquake in 62 or 63 CE (major premise). Any archaeological observation of damage that might 
refer to an earthquake (minor premise) will “naturally” be associated with this earthquake attested in 
the literary sources, with immediate consequences in terms of determining the absolute chronology. 
This interpretation by abduction becomes very problematic when the validity of the major premise 
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To summarize, archaeology is a science based on the observation of incomplete 
elements — whose incompleteness has potentially been accentuated by the archaeol- 
ogist -, which readily resorts to analogy. Mainly based on the material traces 
unearthed during excavations, it aims, through a combination of abduction and 
induction, to reconstruct the practices of past societies. 

From this definition, however lapidary it may be, it is appropriate to question the 
form(s?) taken by experimentation in archaeology, in particular by asking how its 
results, obtained a priori by following modes of reasoning that are impossible in the 
‘normal’ framework of archaeological practice, manage to be integrated into the 
body of archaeological knowledge. 

Since we cannot claim to be exhaustive in our analysis of archaeological exper- 
iments, after a brief presentation of the history of this practice, we will rely on four 
examples, deliberately chosen from the high historical periods (Antiquity and, to a 
lesser extent, the Middle Ages). Indeed, even though the chronological field of 
archaeology extends from the end of hominization to recent times, modern and 
contemporary (Hurard et al. 2014, pp. 3-9), or even very recent, including the 1960s 
(Weller 2014, pp. 40-44), the use of experimentation is less frequent for the ancient 
and medieval periods than for prehistory and protohistory, which allows for a more 
in-depth examination of how it is practiced.* 


11.1 “Experimenting” in Archaeology. A Brief History of a 
Method 


Before proceeding to a brief history of the use of experimentation in archaeology, it 
is appropriate to clarify the perceived practices. Thus, following the warnings issued 
by Peter Reynolds (1999), one of the pioneers of large-scale protohistoric experi- 
mentation in the United Kingdom, we will not consider as experiments the various 
forms of cultural mediation generally carried out by “theater” companies that evoke 
the daily life of past societies through costumed performance.” 

In France, the first attempts at making lithic tools were carried out (Reich and 
Linder 2014) in parallel with the first reconstructions of throwing machines (cata- 
pults, onagers)° made by Jean-Baptiste Auguste Verchére de Reffye for Napoleon 
I, which were tested before being exhibited at the Musée des Antiquités nationales 


comes into question, in this case with the evidence of “continuous” seismic activity between the 60s 
AD and the eruption of Vesuvius in 79. Cf. Monteix 2017, pp. 201-202. 

“On the one hand, I am specialized in the ancient period. On the other hand, the objects of 
experimentation in prehistory are generally much simpler than those of the later periods. The 
variables that can affect experimentation are thus fewer, which limits bias. 

>This practice is now frequently termed re-enactment. 


©The onager is a stone-throwing device, while the catapult is a projectile-throwing device (Reinach 
1926, p. 63). 
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de Saint-Germain-en-Laye. It was in the post-war period, particularly from the 1960s 
onwards, that the practice of experimentation developed in prehistory, as an aid to 
understanding the production of artefacts, in line with the pioneering work of 
Francois Bordes, Jacques Tixier and Don E. Crabtree (1972).’ 

Experimentation quickly went beyond the reconstruction of objects under the 
impulse of Sergei A. Semenov’s works, initiated in 1930 and widely disseminated 
from 1964 onwards after their translation into English (Semenov 1964), thus giving 
birth to traceology, the study of traces of use: the reconstructed artifacts are put back 
into function, so as to create a reference that makes it possible to validate or not the 
hypotheses formulated from the study of excavated objects. These new ways of 
thinking extended to protohistory with the appearance of the first experimental 
centers open to the public, among which the Butser Ancient Farm, founded in 
1972 by Peter Reynolds in Hampshire, England (Reynolds 1999), can be considered 
a model. The aim was to use data from excavated sites to develop and implement 
experimental reconstructions, while welcoming wider audiences. At the same time, 
the epistemological framework of archaeology, especially for the pre- and protohis- 
toric periods, was profoundly renewed thanks to the contributions of ethnology in 
general, which allowed for the emergence of ethnoarchaeology (Coudart 1992) and 
of cultural technology in particular with its development of such important theoret- 
ical concepts as the operational sequence, the notions of variants, etc. (Bartholeyns 
et al. 2010). 

From this too brief history of experimental practices in archaeology, we retain a 
point central to the approach developed in this volume: notwithstanding the attempts 
of the epigones of New Archaeology or, more recently, the assertions of certain 
experimenters, experimentation as it is generally practiced in archaeology has not 
made it possible to define its scientificity. The contribution of experimentation to the 
scientificity of archaeology can even be considered marginal, as the practice of 
experimentation in archaeology remains isolated — both in terms of implementation 
and especially of results -, in particular for the periods after prehistory. 


11.2 A Return to “Experimentation” 


To better understand the different facets of the heterogeneous practice of experi- 
mentation in archaeology, we will briefly present four cases. In addition to a case by 
the present author, the choice of these examples is due to their recent character, the 
fact that all or part of the experiment, from the protocol to the results, has been 
published and that it was possible to discuss with the experimenters to obtain 
additional information. 


7See the historiographical overview drawn up by Meignen and Texier (2011). 
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11.2.1 The GYPTIS, a Sailing Replica of a Ship of the Sixth 
Century BCE 


In 1993, during the excavation of the Place Jules-Verne in Marseille, several 
shipwrecks were discovered. Among these, the wreck Jules-Verne 9, abandoned 
on the shore towards the end of the sixth century B.C., had the particularity of having 
a hull assembled by tying together the different parts of the planking, thus forming a 
“sewn” ship. The study was carried out according to the usual methods of naval 
archaeology, including the creation of small-scale models and the restitution of 
unpreserved parts based on other wrecks from the same period. At the end of this 
study, a project for a sailing replica — a copy of Jules Verne 9, using the materials and 
techniques observed in the original, at the same scale — was included in Marseille- 
Provence’s bid to become European Capital of Culture 2013. 

The reconstruction took place in a shipyard specialized in wooden construction, 
after a test phase on a fraction of the hull that allowed researchers to reconstitute the 
specific gestures used to make the hull, plank first (clinker construction) and “sewn”. 
In October 2013, the Gyptis was launched in the Old Port of Marseille. 

Then second phase of the experimentation began. The aim was to understand and 
measure the nautical qualities of this replica: wind resistance, ability to go upwind, 
and performance in terms of speed. Despite the initial navigation skills of the main 
helmsman, it took a year for him — and the rest of the crew — to get to grips with the 
ship satisfactorily. The various navigation trials — rowing and sailing — were mea- 
sured using a GPS coupled with a wind vane-anemometer, which allowed produc- 
tion of summary graphs showing the nautical capabilities of the Gyptis (Pomey 
2014; Pomey and Povéda 2019; Pomey et al. 2015). 


11.2.2. A Pompeian Oven at Saint-Romain-en-Gal 
(France, 69) 


After six years devoted to the study of the Pompeian bakeries, not only by surveying 
the particularly well-preserved remains on the site, but also by excavating four of 
these workshops, questions and hypotheses had been formulated. These concerned 
the ovens, their forms, and methods of fuel supply (small fraction of wood and olive 
pomace) and the duration of baking. To answer these questions, an oven was rebuilt 
in late summer of 2015 — financed by a cultural sponsorship by the Jacquet 
company — in a space dedicated to experimentation in the Museum and the site of 
Saint-Romain-en-Gal. 

The reconstructed oven is “typical” of Pompeian bread ovens, without being a 
“faithful” reproduction of one of them. During implementation, which benefited 
from the skills of stonemasons, the hypothesis of the erection of the dome on a pile of 
sand was tested, with mixed results: the baking chamber could be built, but it is quite 
likely that this method of preparing the shape of the dome, clearly attested to in 


138 N. Monteix 


Pompeii, had to be prepared in another way. Once the oven was completed, ten 
thermocouples were installed at various points on the dome and sole. These were 
connected to a continuous recording station while the furnace was in operation, and 
they made it possible to determine the temperature gradients in the masonry between 
the inside and outside of the furnace. Since construction, 31 baking sessions have 
been carried out, using different types of fuel (olive pomace, softwood planks, and 
vine shoots). Although several firings can be considered successful, the learning 
phase of the use of this oven is not yet finished as researchers were unable to stabilize 
a mode of heat rise that allows successful firings. This situation is largely explained 
by the difficulties of supplying small fraction fuel (Coubray et al. 2019; Monteix 
et al. 2015).8 


11.2.3 Cutting and Butchering — D. Coupes Project 


To better understand the butchery marks observed on bones discovered in archaeo- 
logical contexts from the Iron Age to the Middle Ages, and whose serialization has 
made it possible to propose hypotheses of cutting patterns for each period, a butchery 
experiment was conducted on animals: three dogs, a cat and two horses.’ The 
different carcasses were cut using tools (cleavers, leaves, knives) recreated by a 
blacksmith following ancient excavated models. Although the experimentation was 
conceived by three archaeozoologists, specialists in the study of archaeological 
bones and the traces they present, the experiment benefited greatly from the skills 
of a veterinarian specialized in anatomy and above all from a certified butcher, both 
also archaeozoologists. The latter was responsible for most of the cutting and slicing 
of the carcasses. 

In addition to a video recording to capture the gestures, each of the cut pieces was 
photographed and weighed. At the end of the experiment, all the bones were 
numbered, cleaned by boiling and dried so that they could be studied alongside 
comparative archaeological collections. Although the data are still being analyzed, it 
is already possible to emphasize that some of the hypothetical cutting patterns, 
proposed from the traces observed on the bones, have been confirmed and/or 
strengthened. Some points of detail — such as the causes of enamel detachment on 
dog canines and incisors — have, however, been revised after invalidation of the 
initial archaeological hypothesis (Horard-Herbin et al. 2017). 


8For the initial results of these firings, see Monteix and Noiis (2021). 

°To respect the rules defined by the ethics committee of the Ecole Nationale Vétérinaire de Nantes, 
where this experiment was carried out, the slaughter was not conducted according to ancient or 
medieval techniques. The animals were euthanized and studied prior to the cutting experiment by 
veterinary students from Nantes. 
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11.2.4 Excavation of an Experimental Cooking Space 
for Ceramics — Bélesta (France, 09) 


In Bélesta-de-la-Frontiére, experimental firings of ceramic objects have taken place 
every year since 1984 during the Journées de la céramique (Ceramic Days). The 
various potters’ kilns that have been reconstructed over the years have mainly been 
based on medieval examples, but kilns based on ethnographic observations have also 
been implemented. A contemporary Portuguese kiln was rebuilt in 1997 and oper- 
ated on a yearly basis until 2004. We will not dwell here on the conditions of these 
experiments, nor on their results. 

After eleven years of abandonment, the Portuguese kiln was turned over to 
archaeologists to excavate, following the protocols and questions that the uncovering 
of such a structure would normally raise. The only difference from a normal 
excavation was that the archaeologists knew from the start what they were going 
to uncover — a functioning potter’s kiln — but they did not know more. At the end of 
the excavation, the archaeologists’ interpretations were confronted with the obser- 
vations made during the experiments and the experimenters’ memories. While some 
of them proved to be quite accurate, others, especially those related to the restitution 
of the kiln’s architecture, clearly pointed to over-interpretation (Allios and Cornet 
2019).!° 

With this example, we have moved away from experimentation in archaeology to 
a true experimental archaeology, where it is the disciplinary practices that are the 
subject of the experiment and not the hypotheses formulated at the time of the 
discovery of artefacts during excavation.'! 


11.3. Understanding Archaeological “Experiments” 


Although not exhaustive, the four cases mentioned allow us to draw a portrait 
of experimentation in archaeology, at least in its application to (proto)historical 
periods. 


'0The over-interpretation here was due to the restitution of a dome in the absence of any element 
relevant to this form of oven cover. The over-interpretation was also a matter of risky abduction 
since most furnaces are indeed equipped with a dome. In fact, the experimental kiln used a 
precarious cover made of sand and sheet metal. “The weight of the theoretical model was so strong 
that it conditioned the analysis of the excavation” (Allios and Cornet 2019, p. 62). 

"The excavation of an experimental space, insofar as it was carried out blindly, without prior 
knowledge of the experiment being conducted, does not fall into the category of rigged aggregate in 
Boissinot’s (2015) sense. The questions raised prior to dismantling and the results obtained are thus 
as valid as those of any other excavation. 
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Archaeology, a science mainly based on observation, makes little use of exper- 
imentation in the strict sense given by Claude Bernard. '* Of our four examples, only 
the excavation of the ovens at Bélesta referred to this method: the facts observed 
during the excavation were judged according to the controlled facts resulting from 
the actual and known use of the excavated ovens. Compared to the normal practice 
of excavation — and even of experimental reasoning — there is, however, an important 
chronological contortion: the control facts were obtained prior to the excavation. It is 
this chronological inversion that limits the application of experimentation in the 
strict sense in archaeology. 

As for the other “experiments”, they fit into experimental practice in a slightly 
degraded way, according to the ratio between the number of variables explicitly 
questioned and the number of variables implicitly recognized as controlled. Trials of 
butchery experiments are probably the closest to experimentation in the strict sense: 
the animals cut up are — apart from variations in size (Duval and Clavel 2018; Lepetz 
and Zech-Matterne 2018) — identical to those whose remains are observed in 
excavations, as are the instruments used for cutting, forged on ancient models." 
With the reconstruction of the Pompeian oven, the relationship between variables 
questioned and recognized as mastered changes. While this reconstruction leaves no 
room for hypothesis — Pompeii’s ovens are generally completely preserved — it 
assumes that the Toulouse bricks used to build the hearth and the tiles used for the 
dome, all obtained by contemporary preparation and firing methods, are equivalent 
to their ancient models, even though it is quite likely that they react, even marginally, 
differently to heat conduction. Moreover, this reconstruction is only functional if one 
accepts that the formal analogy is sufficient to validate the main tool of the exper- 
iment. The same type of degradation affects the Gyptis, whose root structure is based 
on another wreck, while its mast and rigging remain hypotheses based on icono- 
graphic sources. It might appear paradoxical to experimentally test the functioning of 
an essential organ of the ship whose restitution is itself only hypothetical; however, 
since this restitution is recognized as valid in the scientific field, the results of the 
experiment will be just as valid. 

These deformations of experimentation stricto sensu necessarily touch the forms 
of the experiment itself. The distinction that we have drawn between the butchering 
on the one hand, and the reconstructions and uses of the oven and the Gyptis on the 
other, overlaps with the two forms of experimentation proposed by Marianne 


2 Observational and experimental sciences share a common mode of reasoning: “To learn, one must 
necessarily reason about what one has observed, compare the facts and judge them by other facts 
that serve as a control” (Bernard 1865, p. 30). The experimental sciences resort to experimentation 
in the strict sense: “The experiment is the investigation of a phenomenon modified by the 
investigator” (ibid., p. 29). In archaeology, the controlled facts are prior and external to the specific 
object of study. 

'3Tt should be noted, however, that we find here a variable that is implicitly recognized as being 
under control: the identity of form of the cutting tools tacitly implies an identity of functioning and 
mode of manipulation. While such reasoning would surprise any self-respecting ethnologist (Sigaut 
1991), it can, in this case, be quite legitimately accepted by archaeologists. 
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Rasmussen (2001). In the first, called “controlled archaeological experiment, gen- 
erally conceived as the experimental form closest to the “scientific” model 
(Kelterborn 2001), the largest possible number of variables is isolated, and the 
variables are changed from one experiment to the next in such a way as to allow 
measurable and reproducible results to be obtained, which make it possible to isolate 
the variable(s) producing the effect studied. This large group includes both the 
butchery cuts described above and experiments on the manufacture of prehistoric 
flint tools (Pelegrin 1991). 

The second form of experimentation, called “contextual archaeological experi- 
ment” by Rasmussen, makes less allowance for the isolation of variables. Its purpose 
is to acquire data by observation, to test the material efficiency of a hypothesis. The 
experiments conducted on the replicas of the oven and the Jules-Verne 9 wreck are 
of this kind. Other examples, such as the ancient coinage made at Melle (Faucher 
et al. 2009, 2012), also fit this scheme. In these experiments, it is less a question of 
understanding the importance of this or that variable through a set of successive trials 
than of answering, with varying degrees of precision, a question about the function- 
ing of an artifact or a system of artifacts. The more the subject of the experimentation 
concerns complex forms of techniques, the less the variables can be isolated and the 
greater the shift towards this form of contextual experiment. 

By analyzing the process of archaeological experimentation macroscopically — 
and not by proposing an improbable protocol with supposedly universal value — we 
can understand its mechanisms and the biases that can affect it (Fig. 11.1). 

When it is practised — let us emphasize once again that this practice remains 
marginal — experimentation in archaeology proceeds from the interpretations pro- 
posed by the archaeologist. As we noted in the introduction to this chapter, these 
interpretations are in fact reconstruction hypotheses based on a subjective and 
therefore an imperfect observation of initially objective — even if generally 
incomplete — archaeological facts. The questioning of these interpretative hypothe- 
ses may lead to a material experiment, by seeking to answer one of these two 
questions: “Is this artifact produced like this?”’; “Does this artifact function in 
this way?”.'4 The first question will almost automatically refer to a controlled 
experiment, the second to a contextual form of experimentation. In this way, these 
two questions are almost mutually exclusive: the answer to the first question must be 
positive to answer the second; this can become a bias if it has not been possible to 
verify experimentally the validity of this answer. 

If the number of variables to be tested is reduced and the artefact studied does not 
constitute a complex object, it will be possible to proceed with a controlled exper- 
iment, without any intermediate stage, beyond the phase of appropriation of the 
technique implemented, to which we shall return. 


'4Tn both cases, the comparison refers to the formulated hypothesis, based on archaeological data. 
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Fig. 11.1 From excavation to experimentation in archaeology, stages and biases 


This is how the D. Coupes project was able to quickly'° question the way the 
traces of cuts observed on archaeological bones were produced and to test the 
different hypotheses by varying the cutting instruments. 

However, both experiments had to go through a reconstruction phase during 
which experimenters produced the artefact whose functioning was questioned. 
During this phase, specific to contextual experiments, biases appear, linked to the 
passage from the interpretation and reconstruction proposal to its implementation. 
To do this properly, it is essential to validate each of the proposed hypotheses 
throughout the reconstruction. As soon as we place ourselves in a complex technique 
or one involving an ancient machine, however, these hypotheses accumulate to the 
point of making their unitary test impossible. '© As numerous and varied as they may 
be, these biases are acceptable as long as they are recognized and explained. They 
have the consequence of restricting the overall scope of the experiment without, 
however, eliminating it. 


'S Without prejudice to possible logistical and financial problems. 
'© At a minimum, the possibilities resulting from this multiplication of hypotheses are countable as 
2n, where n is the number of hypotheses. 
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It is this progressive shift towards what can be considered as experimentation that 
is sanctioned by the terminological choices made in the field of naval archaeology. A 
distinction is thus made between, on the one hand, restitutions based on little data 
without concern for size (recreations) — unsuitable for experimentation — or based on 
archaeological data for formal questions but using any material (reproductions), and, 
on the other hand, reconstructions — suitable for experimentation — either based on 
data from several ships or true copies (replicas), preserving the scale, materials and 
techniques of an ancient ship (Fenwick 1993): from the recreation to the replica, 
biases diminish considerably. Even in a reconstruction such as the Gyptis, biases 
remain such as the use of portable electric tools for the preparation of the holes 
allowing the components of the hull to be “sewn” together.'’ Assumed and exposed, 
this bias, which did not prejudice the purpose of the exercise — to experiment with 
navigation with a replica of a ship from the sixth century B.C. -, does not prevent us 
from proposing considerations with regards to the construction time of the ship. 
During the construction of the oven, not only were the ceramics — bricks and tiles — 
used to make the dome and the hearth of the oven of modern manufacture, but it was 
also impossible to find blocks of basalt from Orvieto (Italy), identical to that of the 
Pompeian models. Instead, another type of geologically close basalt, Volvic stone, 
was used. Whatever the variations in composition between the different materials 
used and their ancient models, they have been judged to be sufficiently small so as 
not to compromise the understanding of the relationships of heat acquisition and 
diffusion between the different parts of the oven; at most, the records of temperature 
variations during the firings should be treated as orders of magnitude. 

When experimentation becomes effective, and the archaeologist assumes the role 
of experimenter, other biases appear. They are linked to the inevitable appropriation 
of the technique, which is found as much in controlled experimentation as in its 
contextual form: in any technical action, the result will depend largely on the know- 
how of the actor. Depending on the know-how acquired by the person in charge of 
conducting the experiment, it will be necessary either to acquire new knowledge 
specific to the technique being experimented or, on the contrary, to partially unlearn 
the gestures inherited from previous practice. 

In the D. Coupes project, the experience of Christian Vallet, who is not only an 
archaeozoologist but also experienced in contemporary butchery practices, made it 
possible to avoid training someone in cutting during the experiment, and thus to lose 
the first experiments, which would almost certainly have produced inconclusive 
results by the time the individual acquired efficient gestures. Instead, according to 
the experimenter, the gestures practiced during these cutting experiments were not 
influenced by contemporary cutting practices, thanks to the archaeozoological 
knowledge of the cutting points, radically different between Antiquity and our 


"While it was necessary to restore a particular tool for the cutting of the tetrahedral recesses, the 
drilling of the 10,000 ligature points was finally carried out with an electric drill, which did not call 
into question the principles of construction but saved considerable time” (Pomey and Povéda 2019, 
p. 22). 
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days. At most, there was a brief period of adaptation to the handles of the knives, 
which were made of metal not covered with wood and therefore less flexible to 
handle. Moreover, the framework imposed by the experimental practice generated 
some minor biases in the implementation: the cutting support was metal, not wood, 
which makes the gesture less flexible; the gestures were necessarily discontinuous, 
because of the need to deploy a photographic cover and/or to verbalize the gesture in 
the process of it being made in order to make the gesture explicit to the observers. 

Biases related to the appropriation of the technique can be found in the contextual 
experiments. As mentioned above, Pierre Povéda, the main helmsman of the Gyptis, 
took a year to master the boat, navigating between his previous experience and the 
necessary adaptation to a ship of a different type from those previously used. He had 
to unlearn some of his nautical skills: with a ship deploying a square sail, the rudder 
has less of a role in balancing the ship or even steering it. It was therefore necessary 
to go back to the first sailing school exercises where one steers a boat without a 
rudder, balancing it with the sail. 

Before the first baking in the reconstructed Pompeian oven, the experimenter had 
practically never used a bread oven, especially to bake bread after having heated the 
dome sufficiently and having removed the fuel. The first firings were in fact mainly 
devoted to the appropriation of a technique allowing a sufficient increase in temper- 
ature through trial and error. After about thirty firings, this phase of technical 
appropriation has only just been completed. The personal experience thus devel- 
oped, which will eventually be deployed for the rigorously experimental phase, 
remains strictly contemporary and does not claim to reproduce the techniques of heat 
as they may have been practiced in Pompeii in the third quarter of the first century 
CE. The quantities of fuel that will be used and the time that their combustion takes 
to allow a sufficient heat rise will at best be indicative data, which we will assume to 
be close to those of ancient practices. 

It should be emphasized that these biases in the appropriation of the technique are 
not the result of the experimenter effect wherein the latter influences the reading of 
the results of the experiment according to his own hypotheses. The existence of a 
bias in the realisation of the technical gesture remains theoretical, however highly 
probable it may be. In the absence of knowledge of the gestures performed during 
historical periods, it is impossible to demonstrate the existence (or absence) of this 
bias. It is important, however, to recognize it, even if it means limiting the scope of 
the experimental results. 

Experimentation in archaeology rarely makes it possible to really demonstrate 
much of anything.'® It does, however, make it possible to obtain indications 
corresponding to proxy variables on material questions that would otherwise be 
neither measurable nor directly observable. Beyond the latter, and thus obtaining an 


'8Tn this regard, one of the principle publications on the practice of experimentation in naval 
archaeology states: “No experiment can ever prove a hypothesis: it can either disprove it or produce 
results consistent with its predictive statements. In the latter case, the hypothesis may remain 
defensible until it is disproved, or accepted as a theory after it has been recognized as explaining 
the data collected” (Coates et al. 1995, p. 297). 
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answer to the question initially asked, the experimentation phase itself is also a 
learning and observation phase. A rarity in this discipline, archaeologists who 
experiments directly confront the reality of phenomena that they generally study 
through various filters that have led to a loss of the physical substance of the 
phenomena. This confrontation makes it possible to eliminate conceptual a priori, 
often unthought of, developed during studies usually carried out on disembodied 
vestiges while sitting behind a desk in a laboratory or a library: direct experience 
with the material then comes fully into play and makes it possible to consider certain 
elements in order to integrate them into ongoing reflections. 

Thus, it is relatively easy to mention the evisceration phase of the slaughtered 
animals by relying as much on traces linked to the opening of the rib cage as on 
iconographic sources, and then to mention the importance of the viscera for the 
consumption of gods and men (Lepetz 2007). Direct experience, during an experi- 
ment, of the extraction, handling, and management of a hundred kilos of viscera 
from a horse carcass,'® forces the experimenter to perceive this operation 
differently — particularly in terms of logistics, especially if we imagine successive 
sacrifices of several of these beasts during certain ceremonies. 

Similarly, before starting to use a bread oven, I wrote this about the moment 
preceding the loading of the breads: “Therefore, during the baking, the fuel had 
necessarily to be removed from the oven”, without further detail (Monteix 2010, 
p. 158). This carelessness verging on levity shocked neither myself nor my readers. 
When I became an experimenter and was confronted with the need to remove the 
embers from the oven, the situation changed considerably. It was necessary to use an 
iron tool with a wooden handle that had been found in disrepair near a bread oven 
dating from the beginning of the twentieth century to remove the embers remaining 
on the hearth. Since then, evidence of such a wooden tool has been found at Eschenz 
(Switzerland), a site where no bread oven is known at present.?° 

Finally, it is important to point out the main difficulties encountered in the 
implementation of experimentation in archaeology. Trivial as it may seem, the first 
of these is financial: this practice is expensive, especially when compared to the 
money generally available for programmed (“fundamental”) research in archaeol- 
ogy, and especially for contextual experiments that require more complex recon- 
structions. The reconstruction of the Pompeian oven, whose cost totalled 22.000 €, 
was only made possible by a sponsorship, thanks to Jacquet, a company active in 
heritage restoration. The reconstruction of the Gyptis cost 552.000 €, of which more 
than half was provided by the PACA Region, in connection with the events of 
Marseille, European Capital of Culture (Nouvel 2013). Beyond the financing, the 
second difficulty is logistical. Whatever its nature, experimentation in archaeology 


'° Jean-Philippe Corbellini and Marie-Pierre Horard-Herbin, Le projet D. Coupes, MSH Val de 
Loire, 2014. 

°° The interpretation of the wooden pieces was proposed on the basis of an ethnographic parallel. 
See Tasgetium II. Die romischen Holzfunde, Departement fiir Erziehung und Kultur des Kantons 
Thurgau, p. 110, cat. no. 217. 
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requires spaces — whether for the reconstruction itself or for the maintenance of the 
reconstructed artifact — that can be used over the long term for experimentation and 
its necessary iterations. Such spaces are quite rare in France: fifteen years after the 
closing of the Archéodrome de Beaune, the parks where such an activity is ongoing 
are few, for experiments concerning the historical periods. Thus, we can mention 
only the Museum and its Gallo-Roman sites of Saint-Romain-en-Gal (69) or the 
experimental platform on the site of the silver mines of Melle (86), without claiming 
to be exhaustive, but having eliminated the places where mediation takes precedence 
over experimentation. 


11.4 Conclusion 


If it were necessary to summarize in a succinct formula what constitutes experimen- 
tation in archaeology, “to make to (seek to) understand” would be particularly 
appropriate. That experimentation does not produce a “demonstration” does not 
make this practice useless. When controlled, it allows the insertion of data that can 
participate in the regime of “proof” in archaeology, which is fluid and full of 
hypotheses like all social sciences based on observation. When it is contextual, its 
contribution will be more of the order of the production of proxies: the data obtained 
will allow us to propose orders of magnitude for the phenomena studied, orders of 
magnitude that we must not generalize, but bear in mind in order to better frame the 
archaeological discourse. 

As experiments become more complex, the degree of vagueness in the manipu- 
lation of variables tends to increase, which undermines these archaeological exper- 
iments with minor biases, especially in the initial phases of reconstruction. These 
biases, however, remain within the bounds of acceptable practices, if they are 
recognized and justified by a reasoning that conforms to that of the discipline, and 
if they do not conflict with the form of exploitation of the results. The main bias lies 
in the mastery of the technique deployed, to be learned or unlearned: it is impossible 
to be certain that this technique corresponds to the gestures performed by the 
Ancients, even if it leads to an identical result. Therefore, experimentation in 
archaeology is first and foremost an extension of the field of observation, which 
rarely concerns the core of archaeological practices. Archaeological experimentation 
reinforces the credibility of the abductive reasoning carried out in other archaeolog- 
ical studies and, above all, opens the field of possibilities. 
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